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Abstract 

Particle Swarm Optimization (PSO) is a nature-inspired meta-heuristic for solving continuous op¬ 
timization problems. In |16l I18j . the potential of the particles of a swarm has been used to show 
that slightly modified PSO guarantees convergence to local optima. Here we show that under 
specific circumstances the unmodified PSO, even with swarm parameters known (from the litera¬ 
ture) to be “good”, almost surely does not yield convergence to a local optimum is provided. This 
undesirable phenomenon is called stagnation. For this purpose, the particles’ potential in each 
dimension is analyzed mathematically. Additionally, some reasonable assumptions on the behavior 
of the particles’ potential are made. Depending on the objective function and, interestingly, the 
number of particles, the potential in some dimensions may decrease much faster than in other di¬ 
mensions. Therefore, these dimensions lose relevance, i. e., the contribution of their entries to the 
decisions about attractor updates becomes insignificant and, with positive probability, they never 
regain relevance. If Brownian Motion is assumed to be an approximation of the time-dependent 
drop of potential, practical, i. e., large values for this probability are calculated. Finally, on chosen 
multidimensional polynomials of degree two, experiments are provided showing that the required 
circumstances occur quite frequently. Furthermore, experiments are provided showing that even 
when the very simple sphere function is processed the described stagnation phenomenon occurs. 
Consequently, unmodified PSO does not converge to any local optimum of the chosen functions for 
tested parameter settings. 
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1 Introduction 


Particle swarm optimization (PSO), introduced by Kennedy and Eberhart |9l|6], is a very popular 
nature-inspired meta-heuristic for solving continuous optimization problems. Fields of very suc¬ 
cessful application are, among many others, Biomedical Image Processing |21) . Geosciences m, 
and Materials Science m, where the continuous objective function on a multi-dimensional domain 
is not given in a closed form, but by a “black box”. The popularity of the PSO framework is due to 
the fact that on the one hand it can be realized and, if necessary, adapted to further needs easily, 
but on the other hand shows in experiments good performance results with respect to the quality 
of the obtained solution and the speed needed to obtain it. A thorough discussion of PSO can be 
found in [12]. 

To be precise, let an objective function / : IR^ —)• IR on a D-dimensional domain be given 
that (w. 1. o. g.) has to be minimized. A population of particles, each consisting of a position (the 
candidate for a solution), a velocity and a local attractor, moves through the search space R^. The 
local attractor of a particle is the best position with respect to / this particle has encountered so 
far. The best of all local attractors is the global attractor. The movement of a particle is governed 
by so-called movement equations that depend on both the particle’s velocity and its two attractors 
and on some additional fixed algorithm parameters. The pseudo code of the PSO approach is 
visualized in Algorithm Additionally, Definition captures the PSO behavior mathematically 
as a stochastic process. The population in motion is called the swarm. 

There are guidelines known for the “good” choice of the fixed parameters that control the 
impact of the current velocity and the attractors on the updated velocity of a particle ([201 El) 
such that the swarm provably converges to a particular point in the search space (under some 
reasonable assumptions). However, the point of convergence is not necessarily a global optimum. 
Local optima might also be considered acceptable, but unfortunately it is possible that the point 
of convergence is not even a local optimum. In the latter case, one says that the swarm stagnates. 
Examples are presented in Sec.[^ E. g., with established good parameter settings, stagnation can be 
observed if 3 particles work on the 10-dimensional, sphere function. In [TO], Lehre and Witt show 
that there are non-trivial bad parameter settings and initial configurations such that PSO possibly 
converges at arbitrary points when processing the one-dimensional sphere function. However, their 
result is presented for populations of 1 and 2 particles only. Additionally, their parameter settings 
considerably deviate from those generally considered as good mBm)- 

In [161 [TS] , the notion of the potential of the swarm has been introduced. This potential has 
been used to prove that in the one-dimensional search space the swarm almost surely (in the 
mathematical sense) finds a local optimum. If D > 2, the movement equation has been adapted 
slightly to avoid that the swarm’s potential drops too close to 0, and hence avoiding stagnation at 
a non-optimal position. Consequently, this version of PSO almost surely finds local optima. 

A comprehesive overview on theoretical results concerning PSO can be found in [15] . 

The phenomenon of convergence to a point that is not a local optimum, has to the best of our 
knowledge not yet been formally investigated in a setting which is not generally restricted to a 
number of particles or specific parameters for the PSO. In this paper, reasons for the phenomenon 
of stagnation are given. For it, the notion of the potential is modihed such that it is now aware of 
how the particles experience the objective function /. A theoretical model is provided which on the 
one hand uses reasonable assumptions and on the other hand provides a basis to mathematically 
prove that the swarm almost surely stagnates. The model captures the observation that during the 
execution of PSO on objective functions / the contribution of some dimensions to the potential 
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decreases exponentially faster than the contribution of other dimensions. This does not generally 
imply that PSO faces the problem of stagnation, but if the model is applicable, which is often the 
case when few particles are used, then this model supplies indications whether in a specific setting 
stagnation is present or not. The assumptions which need to be applied for the model will be 
justihed in the experimental Section]^ 

The first model is used to prove the statement, that hnally the swarm stagnates indefinitely 
almost surely. In that model we can prove that the conditional probability that stagnation remains 
indefinitely if stagnation emerges at some time T is positive. If the time-dependent drop of potential 
is additionally approximated by a Brownian Motion with drift, then the described probability can 
be calculated by an explicit formula. The values obtained from that formula coincide very well 
with empirically measured probabilities in experiments. If stagnation remains indehnitely mainly 
poor solutions will be returned by PSO. 

Experimentally it is shown that the described separation of potential indeed occurs when PSO is 
run with certain numbers of particles on some popular functions from the CEO benchmark set |19j 
and an additional function. 

This paper is organized as follows: After the formal definition of the PSO process in Section 
in Section]^ the new notion of potential, results of experiments and the proof that under realistic 
assumptions PSO almost surely stagnates, are presented. In Section experimental analysis 
of three benchmark functions from the CEO benchmark set |19j and an additional function is 
presented. 
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2 Definitions 


First the model which is going to be used for our analysis of the PSO algorithm is presented. Algo¬ 
rithm represents the pseudo code of the classical PSO algorithm. No bound handling strategies 
are investigated, because they have almost no influence on the convergence if the swarm is converg¬ 
ing to a point not on the boundaries. Similar to [IS1II8], the model describes the positions of the 
particles, the velocities and the global and local attractors also as real-valued stochastic processes. 
Basic mathematical tools from probability theory, which are needed for this analysis can be found 
in, e.g., [3]. 

Definition 1 (Classical PSO process). A swarm ofN particles moves through the D-dimensional 
search space IR^. Let f : — )• R 6e the objective function. At each time t G IKI, each particle n has 

a position Xf G R^, a velocity G R^ and a local attractor Lf G R^, the best position particle 
n has visited until time t. Additionally, the swarm shares its global attractor G R^, storing 
the best position any particle has visited until the t’th step of particle n, i. e., since each particle’s 
update of the global attractor is immediately visible for the next particle, it can have several values 
in the same iteration. When necessary, Xf''^ is written for the d ’th component of Xf etc. 

analogously). With a given distribution for {Xq,Vq), {Xt-\.i,Vt+i-,Lt+i,Gt+i) is determined by the 
following recursive equations that are called the movement equations." 

■=X^ forl<n<N, 

Gf := argmin f{x) for t > 0, 1 < n < iV, 

Vffi ■.=x-VP + ci-r^Q{Lf-Xf) + C2-s^®{Gf-Xf)fort>G,l<n<N, 

Af+i ■.=Xf + Vffi fort>0,l<n<N, 

■= argmin f{x) for t > 0, 1 < n < N. 

^6{^r+iA?} 

Here, y, ci and C 2 are some positive constants called the fixed parameters of the swarm, and rf', 
sf are uniformly distributed over [0,1]^ and all independent. 0 is meant as the item-wise product 
of two vectors. The result is then a vector as well. 

The underlying probability space is called {Ll,A,P). The a-Algebra A is equal to the generated 
a-Algebra of the family of a-Algebras to which {Xt,Vt, Lt)t£\hi is adapted. 

One can think of At as the mathematical object carrying the information known at time t. 
Mainly At refers to the product cr-algebra for the random variables rf and s”, but we will introduce 
further random variables, which will be measurable by some At or A as well, to expand our model. 
If after the Pth step the process is stopped, the solution is G). Gj is Mf-nieasurable because it 
is the argmin of the local attractors L\,..., , whereas Gt is not Mt-measurable as it partially 

depends on Lt+i. In [1], one can find a comprehensive introduction into fi-algebras and filtrations. 
Informally, At captures the information of the underlying stochastic process, that is available at 
time t, i. e., a random variable is Mi-measurable, iff its value is determined at time t and for a 
random variable V, E[I/ | At] is the expectation of V under taking the information of every time 
t' < t into account. E[P | At] is an Mi-measurable random variable, e.g., if V is Mi-measurable 
then E[V ] At] = V. 

The update process of a particle is visualized in Figure[T} As specified in the movement equations 
of Dehnition 0 the new velocity Vffi of the particle n consists of a fraction of the old particle 
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Figure 1: Update process of a particle with x = 0.72984 and ci = C 2 = 1.496172: Position X, 
velocity V, local attractor L and global attractor G 


without any randomness y • V/^, a randomized part of the vector from the current position to the 
local attractor ci • r” © (L” — Xp) and a randomized part of the vector from the current position to 
the global attractor C 2 -rf 0(G” —Xf). With PSO parameters x = 0.72984 and ci = C 2 = 1.496172, 
as proposed in [1], the terminal point of the randomized vector from the current position to the 
local attractor is sampled uniformly in the blue area of Figure and the terminal point of the 
randomized vector from the current position to the global attractor is sampled uniformly in the 
green area of Figure A possible assignment for the three vectors is visualized by the dashed 
vectors. The accumulation of the three parts yields the new velocity and the new position 
Xp_^^ can be reached by moving Xp by V^+i. After the update of the velocity and the position, the 
local and global attractor could be updated. If also this step is executed, then it can be proceeded 
with the next particle. Algorithm determines the respective approach with pseudo code. 
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Algorithm 1: Classical PSO 
input : objective function / : IR^ —)■ IR 

output: an optimized position G G for the objective function / 

/* initialize all N particles: 

/* the positions X G (R^)'^ and the velocities V G (R'^)^ 

1 {X,V) := getInitialPositionsAndVelocities(); 

/* the local attractors L G (R^)'^ 

2 L:=X; 

/* initialize the global attractor G G R^ 

3 G := argmin^g|^[i] fix); 

4 while termination criterion not fulfilled do 

5 for n := 1 to iV do 

6 for d := 1 to H do 

/* update d’th velocity coordinate of n’th particle 
/* rand(a,6) supplies a uniform random value in [a,b] 

7 V[n][d] := x-V[n][d] + 

8 +ci • rand(0.0,1.0) • (L[n][d] — X[n][d]) 

9 +C 2 • rand(0.0,1.0) • {G[d] — A[n][d]) 

10 end 

/* update position of n’th particle 

11 X[n]:=X[n] + V[ny, 

/* update local attractor of n’th particle 

12 if f{X[n]) < f{L[n]) then 

13 I L[n] := X[n\; 

14 end 

/* update global attractor 

15 if f{X[n]) < f{G) then 

16 I G := X[n]-, 

17 end 

18 end 

19 end 

20 return G ; 


*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 

*/ 





3 Stagnation Analysis of PSO 


In this chapter some assumptions are made, such that theoretical analysis can be done. With these 
assumptions theoretical proofs are provided, which state that PSO does not reach local optima 
almost surely if the number of particles is too small. 


3.1 Theoretical Tools 

First of all some theoretical tools are introduced to make later proofs more manageable. The 
following technical, well known lemma, which will be used in the proof of Lemma gives sufficient 
conditions for an infinite product of probabilities to have a positive value. 

Lemma 1. If 0 < ai < 1 for every i G IKI and < oo, then 

OO 

n(i-«i) >0 ■ 

i=l 

Proof. — tti) > 0 iff 3no G IKI : nS:no(^ “ ^ Choose no sufficiently large such that 

EZno < C It follows that n“no(I - > 1 - E“no > O' ° 

In the following, the Markov’s inequality is applied to prove that any finite sum of independent, 
identically distributed random variables, with negative expectation and finite first six moments 
each, stays below zero with probability > 0. 

Lemma 2. Let be independent identically distributed (i.i.d.) random variables with expeeted 

value p*, let p < 0, M > 0 and po > 0 be constants. If p* < p, E[(/t — p*y] < M for all 
i G {1,..., 6} and P{It < 0) > po, then: 

• I’(Ef=o ^ ® constant C > 0, such that C := C{p, M), and for all t > 0, 

• P(Vt G IKI : Ef=o Z — — P ® constant p > 0, such that p := p{p, M,pq). 

Proof. As E[/o — p*] = 0 and I^ are i.i.d., we get 


t-i 


E 


E('f 


t=o 



=t • E[(Io - p*f] + Wt{t - 1) • E[(/o - p*fy 

+ I5t{t - l)E[(/o - P*ZM{Io - p*f] + 15t{t - l){t - 2)E[(Io - p*Zf 


<C-t^, 


where 


C :=C{M) := M + 10 • + 15 • + 15 • 

>E[(/o - P*f] + 10E[(/o - p*f? + 15E[(/o - /^*)"]E[(/o - p*f] + 15E[(/o - P*?]\ 

because every product which contains E[{Ii — as a factor is zero and therefore only the 

exponent six, the pair of exponents two and four, the pair of exponents three and three and the 




tuple of exponents two, two and two remain. With the Markov’s inequality on the sixth power of 
the centralized sum, which is a positive random variable,we get 

t=0 i=0 

Markov E[(Eti(4-/^*))^] (7 C 

where C := C(/i, M) := C{M)/iJP. Those inequalities prove the first part of Lemmaj^ To complete 
the proof further inequalities are needed, which will be received with conditional probabilities. Let 
t < T be a non negative integer. 


P(^ ^ < 0 I ^ It < 0 A Vt' < t : ^ It < 0 

t=0 t=0 t=0 

f i t' 

> p( J]] It < 0 I ^ It < 0 A Vt' < t ^ It < 0 

t=i+l t=0 t=0 

f i t' 

= P( X] 0 I ° ^ ° 

t=t+l t=0 t=o 

f i t' 

> p( J]]lt < 0 I ^It > 0 AVt'< t : ^It < o). 


t=0 


t=0 


t=0 


As 


Vt'<t: J];it <0j 

t=0 t=0 

is a convex combination of the first and the last term of the recent series of inequalities, it is less 
or equal to the greater value 


P( ^ It < 0 ^ It < 0 A Vt' < t : ^ It < 0 


t=o 


t=o 


t=0 


and by induction 


P(^lt<0 Vt'<T:^It<0) >p(j^lt<0). 
t=0 t=0 t=0 

Summing up all these probabilities leads to a sum, which is bounded: 

CO £—1 CO ^ 

i=l i=0 


t=l 


( 1 ) 


Since P(It < 0) > po > 0, it follows that 


t-i 


p(Vt < T : ^ Ij- < Oj > P(Vt < T : It < 0) > > 0. 


t=o 
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Therefore the probability to reach no positive value is positive for fixed finite times. As mentioned 
in equation the infinite sum with the probabilities, which specify the probability to remain non¬ 
positive at a specific time, is bounded and therefore Lemma is applicable. 

Altogether we have 


t-i 


i'-i 


P(^Vt e IL : ^ < o) = ^hm P(^Vt' < t : ^ 4 < 0 

t=0 i=0 

oo i—1 t' — l 

= P(/o<0)[]p(^A<o|vt'<t:^A<0 

i=2 t=0 t=0 


oo t—1 

^ np(E'‘S») 

i=l t=0 

OO 

> Yl (^1 - min (^1 - p > 0. 

i=l 

'-V-' 

>0 with Lemma0 

Obviously p depends only on p, M and po because in the last expression only C := 
Po appear. 


C{p, M) and 
□ 


The next lemma provides some estimation for 
for the sixth moment of their sum is provided. 

Lemma 3. Let A and B be independent random 
tation zero. There exists a fixed function /i : IR —)■ 

• \E[{A + By]\ < h{M) 

• |E[A*]| < h{M) 

• |E[B*]| < h{M) 
i/E [{A + Bf] < M. 

Proof. Eor each i G {1,..., 6} we define fixed functions hi^i : IR —>■ R, and /i 2 ,j : R —)• R such that 
. \E[{A + By]\<hi,i{M), 

• |E[A*]| < /i 2 ,i(M) and 
. \E[B^]\ < h2,iiM), 

iiE[{A + By] < M. 

All expected values exist as specified. Only the inequalities without absolute value need to be 
shown because all evaluations can also be made with —A instead of A and —B instead of B, which 
guarantees the inequality for odd i. Eor even i the absolute value has no effect on the equation 
because the value inside the absolute value is already non-negative. As random variables A and 


the moments of two random variables if a bound 

variables with finite first six moments and expec- 
R such that for each i G {1,..., 6} 
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B are independent, the expectation of a product can easily be separated. For i = 1 all expected 
values are zero and therefore the following functions are possible: 


:=h2,i{M) :=0. 

hi^i{M) := M + 1 for all i G {2,..., 6}, because 

E[(^ + Bf] <E[max(l, {A + .B))*)] 

*<E[max(l,(A + .B)®)] (2) 

<E[1 + (A + .B)6)] 

<1 + M. 

E[{A + B)^] =E[A^] + 4E[y43] E[.B^] + 6E[^]E[£^+4E[^^] E[B^] + E[B‘^], 

=0 >0 =0 


which leads to 


and therefore 


E[A‘^]+E[B^] < E[{A + By] < 

>0 >0 


/i2,4(M) := /ii,4(M) = M + 1 

is possible. Similar to inequalities 

h2,i{AI) := h2^4{AI) + 1 = M + 2 
is received for all i G {1, 2,3}. Similar to the case i = 4 we receive 

E[{A + Bf] =E[Ay + E[AyE[By+(^^ E[Gl3]E[fi3] +( " ] E[A‘^]E[By +E[S®] 


</ii.6(M) 


>0 


>-ft2,3(M)2 


Q 


>0 


E[A‘’] + E[5'^] < hi,6(M) + ( ^ )h2,3(M)^ =: h2fi{M) 


>0 


>0 


and hnally /i 2 , 5 (M) := 1 + h 2 fi{M). 

For eai 
value of h 


For each M the maximal function value of all functions hij is a suitable choice for the function 


h(M) := max hi AM) 


□ 
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3.2 Potential and Stagnation Phases 

Now the idea of a stagnation measure is introduced, which is a multidimensional extension to the 
potential used in [inumiiH]. For every step, a D-dimensional vector of potentials is evaluated - 
one potential value for each dimension. It is intended that the greater the value of such a potential 
for a single dimension is, the greater is the impact of this dimension on the behavior of the swarm, 
i. e., the greater is the portion of the change in the function value, which is due to the movement in 
that dimension. This property is not declared in the definition because it cannot be quantified in a 
strict way. It will be specified in detail in the Assumptions and Furthermore, the logarithmic 
potential is defined, which compares the impact of a specific dimension with the maximal impact 
along all dimensions. The dimension which has currently the highest impact on the swarm has 
a logarithmic potential of zero and all other dimensions will have no larger logarithmic potential. 
Since the convergence analysis in [8] implies that the general movement of a converging particle 
swarm drops exponentially, a logarithmic scale is used and linear decrease is expected. 

Definition 2 (Potential). Let $ ; |R^ ftg a measurable function. Let At be a positive 

integer constant, which will he called the step width. ^{t,d) is defined as 

^{t,d) := Lt.At))d- 

: IKIo X {!,..., D} —)• (n — 7- IR) is a function which evaluates to a random variable for each pair 
in iKlo X {!,..., D}. is called a potential if ^{t,d) is positive almost surely for all t and d. 
Additionally 

'if{t,d) :=log(^{t,d)/ max $(t, J)) = log d)) — log f max <h(t,d)') 
de{i,...,D} J \ J ^de{i,.^ 

is called a logarithmic potential and {It,d)t&u, with Lt^d '■= ^{t + l,d) — '^{t,d) are called increments 
of dimension d. ^{t,d), 'I'(t,d) and It -14 are At-At-wieasurable random variables, where At is the 
a-algebra, which is specified in Definition\^ 

In this paper the expression log is associated with the logarithm of base 2. The step width 
At specifies how the time is scaled in respect to the potential, which means that the potential is 
only evaluated for PSO configurations {Xt,Vt, Lt) if t is a multiple of At. The effect of At will 
be explained later. In Section and in all figures a potential is used which will be similar to the 
item-wise product of the current velocity and the gradient of the function at the current position. 
More precisely, the following potential will be used: 

Definition 3 (Experimental potential). The potential is defined as 

Ht,d):= max |/(XX,*)-/(X^’,"J| 


where {X^’^)^ := 


n^d I '\rn,d 

^ -r ^ 


X. 


n^d 


if d = d, 
otherwise 


and f is the objective function which should he optimized. 

X'^'^ represents the position of a particle if only a step in dimension d is done. The deviation 
in one dimension is defined as a limit of a difference quotient. We get this difference quotient by 
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dividing the potential by the velocity. If the absolute value of the velocity becomes very small we 
get something similar to the item-wise product of the current velocity and the gradient as already 
mentioned. If the absolute value of the gradient in some dimension tends to zero then higher 
deviations influence the potential. It might be possible to use other potentials but for the objective 
functions, which are considered, this potential is sufficient. Mainly there are two prerequisites which 
need to be fulfilled such that this potential is suitable. Firstly, the velocities need to be nonzero 
almost surely. If not all particles are initialized at the same point, then the velocities will not 
encounter a value of zero almost surely. Even for zero velocity initialization this is almost surely 
true, if the evaluation of the potential starts at the second step and standard position initialization 
is used. Secondly, the objective function needs to fulfill the property that there exists no set of 
positive measure such that each point evaluates to the same function value. For most continuous 
functions which have no plateaus this prerequisite is achieved. Therefore this potential is positive 
for such functions almost surely if a standard PSO initialization is used. 

By means of the logarithmic potential, a stagnation phase can be defined. If the logarithmic 
potential of a dimension becomes very low, then there are other dimensions which have currently 
much more impact on the swarm behavior. Finally, a dimension which has very low logarithmic 
potential for a long period of time will be far away from an optimized position in this dimension. 
To specify this activity more in detail, the following definition is stated; 

Definition 4 ((At, Nq, cq, c^j-Stagnation phase). Let At be a constant step width, a potential 
with the associated logarithmic potentials T and their increments It^d, -^o o positive constant integer 
less than D and cq < Cg < 0 negative constants. The following stopping times are defined: 

fi-i :=0 

and inductively for all i > 0 : 

ai :=At ■ inf{t > : \{d £ {1,..., D} : T(t, d) < co}| > Nq} 

O' ■ O' 

/3i :=At • inf{t > ^ : lid G {1,... ,T»} : T(^,d) < Co A max T(t', d) < cj| < Aq} 

At At ^<u<i 


If ai is finite, then it is said that the i ’th {At, <I>, Nq, cq, Cg)-stagnation phase starts at time a*. If fii 
is finite too, then it is said that the i ’th {At, 4>, Nq, cq, Cg) -stagnation phase ends at time (3i. If ai is 
finite but fii is not finite then the i’th {At, <!>, Nq, cq, Cg)-stagnation phase does not end. The event 
that a {At, <I>, Nq, cq, Cg)-stagnation phase starts at time TgAt is defined as {3z G iKlo : Oi = doAt}. 

Nq defines some minimal number of dimensions, which stagnate during the complete phase, 
i. e., the logarithmic potential in these dimensions is at most Cg during the stagnation phase, cq 
defines a starting safety distance to the highest potential and Cg defines a permanent safety distance 
to the highest potential such that dimensions which stay below that bound remain insignificant. 
It is intended that it should not happen that stagnating dimensions, i. e., dimensions with low 
logarithmic potential, influence the attractors, because the swarm behaves differently if additional 
dimensions influence the attractors. To ensure that stagnating dimensions have almost no influence 
on the swarm, the logarithmic potential of stagnating dimensions need to be the safety distance 
apart of the logarithmic potential of dimension with most influence. The bigger this safety distance 
is the more reasonable is the assumption of actual insignificance of stagnating dimensions. It 
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might happen that a* = /3i_i. For example the set A^o} can be the set of dimensions, 

which have a logarithmic potential of at most cq at the start of a stagnation phase. During the 
stagnation phase the logarithmic potential of dimension {Nq + 1) can become less than cq. If 
the logarithmic potential of dimensions 2 ,, No stay below cq and the logarithmic potential of 
dimension 1 increases to a value larger than Cg, then this stagnation phase ends and the next 
stagnation phase immediately starts because all dimensions in {2 ,..., Nq + 1} have a logarithmic 
potential of less than cq. Therefore it can happen that a stagnation phase starts immediately after 
another stagnation phase has ended, i. e., ai = /3j_i. Furthermore, stagnation phases will not end 
at the same time as they start, i. e., > (3i, because cq < c<j and therefore the size of the set of 

dimensions with low logarithmic potential is at least A^o at the beginning. 

Dimensions with low logarithmic potential have low influence on the change of the value of the 
objective function during one step compared to the other dimensions. This is obvious in respect to 
the experimental potential, which is introduced in Definition because the d’th value represents 
the change of the function value if we do a step in dimension d and leave the other dimensions as 
they are. If the logarithmic potential of a dimension d is low for some period of time, this dimension 
is called stagnating dimension. Stagnating dimensions are called stagnating, because their small 
impact on the change in the value of the objective function results in a small impact on the decision 
whether a new position is a local attractor or a global attractor. Therefore the decision whether a 
new position is an attractor does hardly depend on the movement in the stagnating dimensions. 

3.3 Unlimited Stagnation Phases 

Definition 5. Let Q he a subset of probability distributions on IR. Let At be the constant step 
width. Let (i?i^rfl)tetM,rse0 of random variables such that 

• for all t > 0 and F^ G 0, *-5 A/^t-t-fneasurable and has distribution F^, 

• for all t > 0, d G {1,..., D} and Fj G 0, Jt-i,d,rj is AAt-t-measurable and has distribution 
Tj, 

• for allt>0 andTsT^j G 0, the random variables ■ ■ ■ , independent 

and 

• for all t > 0 and Fb,Fj G 0, {Jt,d,rj)i<d<D) is independent from AAt-t- 

The a operator provides the smallest cr-algebra, such that the arguments are measurable. A 
detailed introduction into cr-algebras can be found in [T|. Only a small portion of the introduced 
variables are needed, but if this huge set of random variables is not defined, random variables would 
have been used which exist only under certain conditions, which is formally not possible. 

From now on the reality is approximated with the following model. 

Assumption 1 (Separation of logarithmic potential). It is assumed that for objective functions f 
there exist At, Nq, cq and Cg such that for any fixed time Tq ■ At there exist A/\t-To-fneasurable 
Q-valued random variables Fj^^o ci'^d Lb,To such that 

• for all T >To and d G {1,..., D} the random variables Bt,Tb tq JT,d,TjTQ “^(T+i)At- 
measurable, 
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• for fixed times TgAi and TAt and fixed dimension d, P(^ \B) = 0, where 

A := {(3i : Oj = ToAi A/3j > TAt) A'I'(To, d) < Co A max 'I'(^^d)<Cs} 

and B := {lT,d = -Sr.rs.TQ + 

• for all uj £ Q the expectation of a Tj^To{^)- distributed random variable is zero and 

• for all u G Q the first six moments of a random variable, which has distribution F b,Tq{^) or 
rj,To(w), exist. 

If there is no start of a stagnation phase at time Tq then and Tb,To can have any value 
which does not contradict with the restrictions in Assumption The value of T ^-^d ^b,To in 
these cases is not used in the analysis. 



200 000 202 000 204000 206 000 208 000 210 000 

timestep t 

Figure 2: Logarithmic potential (with potential as specified in Definition on sphere function in 
25 dimensions and with 3 particles; each line represents the logarithmic potential of one dimension; 
visualized are time steps 200 000 to 210 000 

The event A specifies that there is a stagnation phase which has started at time ToAt and has 
not ended at time T At or earlier and the dimension d is one of the dimensions with low logarithmic 
potential. This event implies almost surely the event B which states that the increments belonging 
to At time steps after time TAt are composed by two components. 

Mainly this assumption states that during stagnation phases the change in logarithmic potential 
of dimensions with low logarithmic potential is composed of some basic random noise [Bt^Fb tq)^ 
which depends on the behavior of dimensions with high logarithmic potential, and some individual 
random noise (Jr.d.rjTp); which depends on the randomness in this specific dimension. The basic 
random noise {Bt,Fb tq) ^e called base increments and the individual random noise {Jt4,Fjtq) 
will be called dimension dependent increments. These two parts are independent of each other 
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and they are also independent for different times t as specified in Definition The reason for 
this separation is that the dimensions with low logarithmic potential have no effect on local and 
global attractors of the swarm. Therefore these dimensions do not interact with each other. The 
distributions of the random variables Bt^Fb tq '^T,d,TjTQ fixed at time To, because T b,Tq and 
Tj^'Tp are .AroAt-measurable. The reason for this assumption is that according to the experiments 
the distributions mainly depend on the set of dimensions with low logarithmic potential, which 
is fixed at the beginning of a stagnation phase. Therefore 0, the set of probability distributions 
on IR specified in Definition need to contain only very few distributions. The dimensions with 
high logarithmic potential determine the behavior of the particles, which partially determines the 
logarithmic potential of the dimensions with low logarithmic potential. An additional part of the 
logarithmic potential in a dimension arises from the random variables in the movement equations 
of the PSO in that dimension. These two parts determine the logarithmic potential of dimensions 
with low logarithmic potential. In Figure you can see the similar behavior of dimensions with low 
logarithmic potential. Admittedly in any precise choice of a potential these new random variables 
will not be completely independent of each other, but if Af is enlarged it can be expected that 
the dependencies become small. For example, if At is set to 1 then the increments describe the 
behavior of consecutive steps which depend on each other in a strong manner but for a larger 
At larger intervals depend only partially on each other. Therefore it is reasonable to accept the 
property of independence in the variable t for the theoretical model as an approximation of the 
real-world process. 

Furthermore, the dependencies and independencies in Assumption are not needed completely, 
but they are used to get some inequalities. These inequalities will be described and justified exper¬ 
imentally in Section]^ 


Definition 6 ((At, <I>, Nq, cq, Cg, /r, M,po)-Stagnation phase). Let ^ G R, M G R, po £ [0,1], Tq > 0 
he constants. 

Let A be the event that a {At, ^,Nq,co,Cs)- stagnation phase starts at time ToAt, i. e., 

A = G iKlo : Oi = ToAt}. 


Let Cl he the event that the expectation of a Tb,To distributed random variable is less or equal than 
p, i. e., 

Cl = {E.[BTo,rB,To I -^ToAt] < p}- 

Let C 2 he the event that the sixth moment of a Fb^-Tq distributed random variable and a Fj^r^ 
distributed random variable is bounded by M, i. e., H := tq l“^ 7 bAf] 


C2 = {BTo,rB,To - H ^ToA* < m|. 


Let 6*3 be the event that a Tb,T o-distributed random variable stays below ^ with a probability of at 
least pq, i. e., 


C 3 = {p(^To,rs,2 


-^<0 


•^ToAtj > Po| = |e 




T- — ^<0 

^0AB,Tq 2 -^ 


Am 


At 


> Po}- 


Let Ci be the event that a F j^Tq- distributed random variable stays below — ^ with a probability of at 
least Pq, i. e., 


^4 ^'^T'odTs.To 




•^Tn 


A?j > Po} = |e 


^Ao.i.r 


BTo 


-f<0 


.AtoA* 


> Po 
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A (At, No, Co, Cs)-stagnation phase is called (At, Nq, cq, Cs, /x, M, po)-stagnation phase if 
it achieves the conditions mentioned in Ci, C 2 , Co and Ca, i. e., A n Ci n (^2 H C 3 n 6*4 is called 
the event that a (At, No, cq, Cg, pL, M,po)- stagnation phase starts at time ToAt. 

By definition the question whether a (At, A^Oj cq, Cs) -stagnation phase, which starts at some 
time To • At, is a (At, No, cq, Cg, ti, M, po)-stagnation phase can be answered at the beginning 
of that phase, i. e., the indicator function of this event is ^To-At-measurable. 

Definition 7 (Good stagnation phase). A (At, <f>, Nq, cq, Cg, pL, M, po)-stagnation phase is called 
good if p, < 0 and po > 0 . 

These definitions only classify stagnation phases. For a single stagnation phase which has 
negative expectation and bounded sixth moment p and M can be chosen as the exact values. 
Furthermore, the probabilities which should be bounded below by po are positive because the 
probability that a random variable encounters a value less or equal its expectation or even larger 
values is always positive. Such stagnation phases are called good, because it can be proved that 
those phases do not end with positive probability. 

The following theorem shows that, with positive probability, the swarm never recovers from 
encountering a good stagnation phase. 

Theorem 1. A good (At,^,No,co,Cg, p,M,po)-stagnation phase does not end with a probability, 
which is at least p := p(No, p, M,po) > 0 . 

Proof. Let Xo, Vo and Tq be initialized such that a good (At, <h. No, cq, Cg, p, M, po)-stagnation 
phase starts at time 0 and let Fj and F^ be the related probability distributions of this stagna¬ 
tion phase specified in Assumption [T} As the PSO-process is a Markov-process, evaluating the 
process which starts in a good (At, <h, Nq, cq, Cg, p, M, po)-stagnation phase is equivalent to 
evaluate a good (At, <1>, No, cq, Cg, p, M, po)-stagnation phase within a PSO run. Let So be 
{d G {1 ,..., D} I 'I'( 0 , d) < Co}, the starting subset. So, F b and Fj are fixed objects here, because 
they are always known at the beginning of a stagnation phase. Then the probability that this good 
stagnation phase does not end is equal to 

Pfac/C So : |t/| > iVo A sup ^(t,d)<Cg). 

^ d&U,t>0 ^ 

Let S be defined as {d G So : |{1,... , dj H So| < Nq}, the subset of So, which contains the 
Nq smallest indices. S is a fixed object too. The probability that this good stagnation phase 
does not end is at least P (sup^^gg ^ c*). Let It^d '■= Bt;rB + Jt,d,Tj (see Assumption 

[^, then lt d ^ h,d if the stagnation phase has not ended at time t or earlier and d is in the set 
{d G Sol maxQ^^<j T(t, d) < c^j. Therefore It^d = It,d for all d G S if maxjg^Q^j-^^ T(t, d) < Cg. It 
follows 

p( sup T(t, d)<Cs) 

^t> 0 ,d&S '' 

t-i 

= P( sup ^ 4 ^-bT( 0 ,d) < Cs) 
t-i 

> p( sup ^ipd + co<Cg) 
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t -1 


p( sup '^ii^d<Cs-co 


t=0 

t-1 


> P( sup 

Vt>o,de5-^o 

> P(supf] (b„. - < OA sup V + I) < 0 


!,rj + f ^ 


t-1 


t=o 

t-1 


t>o,de5-^o 


t>0 


P( Z] - S) ^ o) n P( ®^P Z (+ ?) - ® 


t-1 


t=o 


deS 


t>o 




t =0 


The expectations of and are at most /i/2, which is less than zero. 

With Lemma |3] the first six moments of 

(^t,rs “ - ^/2]) and + /^/2 - E['^t,d,rj + 

can be bounded by h{M). Therefore Lemma tells us that 

t-1 

P ( sup ^ (B^--/i/2) <0^ >p{n/2,h{M),po)>0 


and 


Altogether we have 


t>o - 
t=o 


t-1 


P(sup^ +/t/2) < o) >p{p/2,h{M),po) > 0. 

t=o 


P(3C/c5o: |B| >iVoA sup T(t,(i)<c,) 

deu,t>o 

> P( sup 'I'(t,d)<Cs) 

t>o,des 


t-1 


> P ( sup (B,-r^ - a) - °) n P ( ^'^P Z + ^) < 0 


t-1 


t>o 




t=0 


des 


t>o 


t=o 


> p(^^,h(M),po)-p(^,h(M),po 

=: p(No,p,M,po) =: p 


No 


□ 

According to Donsker’s theorem [2] incremental processes composed of independent idential 
distributed increments converge to a Brownian Motion if they are correctly scaled. As our general 
model uses independent increments, we also discuss the Brownian Motion as a feasible model for our 
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process. To acquire Brownian Motion it is additionally assnmed in the model that the increments 
are normally distribnted dnring good stagnation phases. Then a Brownian Motion, with some 
variance and negative drift /r according to the expectation and variance of the increments, 
approximates the logarithmic potential. I. e., the Brownian Motion at discrete times wonld be equal 
to the logarithmic potential at these times. In [TJ], it is proved that the probability that a standard 
Brownian Motion crosses some line y = m- x + t is equal to exp(—2 -m-t) for positive values m and 
t. Under these assnmptions, {jid ~ ^)) approximated by a standard Brownian 

Motion, i.e., a Brownian Motion with expectation 0 and variance 1 per time step. The previous 
bound Cs is now moved to ■ If the Brownian Motion stays below this line, then it also 

stays below the line at the discrete time steps referring to time steps of the PSO, which implies that 
the logarithmic potential stays below Cs indefinitely. Hence, the probability that the logarithmic 
potential stays below Cg forever, is approximated by 1 — exp( 2 where // and ci^ represent 
the expectation and variance of the increments per At time steps. An approximate lower bound 
for the probability that at least Nq dimensions in Sq stay below Cg is represented by the Ao’th 
power of the probability in one dimension: (1 — exp (—2 previons model we 

were only able to determine some positive lower bound for the probability that stagnation remains 
indefinitely. This model supplies the previously mentioned formula, which approximates the actual 
probability quite well for at least Nq = I. In Section 4.5 the values obtained from that formula will 
be compared to experimentally measnred values. 



timestep t 

Figure 3: Logarithmic potential (with potential as specified in Definition]^ on sphere function in 
10 dimensions and with 2 particles; each line represents the logarithmic potential of one dimension, 
Co = —40, Cg = —20, Nq = 7 


19 















3.4 Convergence to Non-Optimal Points 

Definition 8 (Phases). Let At, Nq, cq, Cg, fj-, M and po be fixed objeets. Let (ai)ieN |3 and 
be the stopping times specified in Definition^ Furthermore, the following stopping times 
are defined: 

/ 3-1 = 0 


and inductively for all i >0: 


dii = inf{t > /3j_i : 3j £ Uq : aj = t A the {At, <h, cq, Cg)-stagnation phase 
starting at time t is a{At,^, Nq, co,Cs, p, M,pQ)-stagnation phase}, 
fii = inf{t > dii : 3j G iNJo : fij = t}. 

di and fi determine the start and end of the i’th {At, <h, Nq, cq, Cg, p, M, pq) -stagnation phase. 
Phases are random time intervals. Phases of type PHx are phases before or between {At, <h, Nq, 
Cq, Cg, p, M, Pq)- stagnation phases. The i’th phase of type PHx is defined as 

[fii-i,di] ifdi<oo 
[i3i-i,oo[ if di = oo A fii-i < oo 
0 otherwise. 


Phases of type PHy are {At, Nq, cq, Cg, p, M, pq) -stagnation phases which have an end. The 
i ’th phase of type PHy is defined as 

[di,fii] iffii<oo 
0 otherwise. 


The phase of type PHp is a {At, Nq, cq, Cg, p, M, pq) -stagnation phase which has no end and 
it is defined as 

[di, oo[ if di < oo A fii = oo 
0 if no such i exists. 

Xi is defined to be the duration of the i’th phase of type PHx: 


X^ 


di-fii-i iffii-i<oo 
0 otherwise. 


Yi is defined to be the duration of the i ’th phase of type PHy: 


Yi 


fii - di if fii < oo 
0 otherwise. 


Furthermore, the random variable Tp is defined as the time when the final phase of type PHp 
starts: 

OO 

Tp :=Y,Xi + Yi. 

i=0 

Yi is a ^Q-valued random variable and Xi, Tp are IKIq U {oo}-valued random variables. 
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A phase of type PHx is the interval beginning at the start of the PSO or at the end of a (At, 
A^O) Co, Cs, /r, M, po)-stagnation phase and ending at the start of the next (At, <I>, Nq, cq, c<j, 
/i, M, po)-stagnation phase. If there is no further (At, <I>, Aq, cq, c^, M, po)-stagnation phase 
then this phase does not end. A phase of type PHy is the time interval of a (At, <I>, Nq, cq, Cg, 
/i, M, po)-stagnation phase which has an end and a phase of type PHp is the time interval of a 
(At, <I>, Aq, Co, Cg, jj., M, po)-stagnation phase which has no end. The duration of a phase equals 
the difference of the last time step and the first time step contained in the related time interval 
and is infinity if the time interval has no limit. If the i’th instance of type Pi/x-phase of a specific 
run does not exist then Xi is zero. If the i’th instance of type Pi^y-phase of a specific run does 
not exist then Yi is zero. Tp is infinity if no phase of type PHp starts. The process starts with a 
phase of type PHx- Phases of type PHx and PHy are alternating until the final phase, which 
has type PHp, begins. Alternatively the last phase could be a phase of type PHx or phases of 
type PHx and PHy are alternating infinitely often, but it will be shown in Theorem that this 
does not happen almost surely. It may occur that a phase of type PHx has duration zero if the 
set of dimensions with low logarithmic potential has changed. Figure]^ shows an example run of a 
10-dimensional PSO with 2 particles, cq := —40, Cg := —20 and Nq := 7, i.e., a stagnation phase 
starts whenever 7 or more of the 10 dimensions have logarithmic potential below —40. 

In the following definition a group of functions is specified, which allows for comprehensible 
assumptions. 

Definition 9 (Composite functions). A function is called composite if it can he written as f{x) = 
9iYlF=i where g is a strictly monotonically increasing function and (/i)ig{i,..,,D} func¬ 

tions which have a lower bound each. 

For example all q-norms (X]i=i 1^*1'^)composite functions with fi{xi) := \xi\’^ and 
1 

g{y) :=sign(y)|y|'3, where sign represents the signum function sign(y) := 1 if y > 0, 0 if y = 0 
and —1 if y < 0. 

Furthermore, the following notation will be used: 

Definition 10. For objects W = (IT^,..., IT'^) and S C {1,...,Z1} we write for the 

object, where all entries in dimensions contained in S remain, i. e., let di to d| 5 | be the dimensions 
contained in S in increasing order then = (bP'^p ..., IT'^l®!). 

Analogously is written for where all entries in dimensions contained in S are 

removed. 

Assumption 2 (Insignificance of stagnating dimensions (1)). Let Tq G IN, T G IN and let S be a 
subset of {1,..., D}, such that Tq < T and l^l > Nq. For fixed objects At, 4>, Nq, cq, Cg and a 
composite objective function f{x) = fi{xi)) a modified PSO can be defined. The modified 

PSO starts at time ToAt with initialization 

XpoAt ■■= Vp^xt ■-= V^^Z,LT,At := 4Si ^rid 

objective function f := E fi{Xi). 

i^S 

Furthermore, similar to Definition^ the movement equations for the modified PSO for all particles 
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n (1 < n < N) are defined to be 

G” := argmin f{x) for t > To At, 

Vfli :=X • Vfi + Cl • O (Z^ - Xf) + C2 • © (G? - Xf) for t > ToAt, 

Xr+i :=Xf+ Vfl, fort > ToAt, 

ZjYi := argmin f{x) for t > ToAt. 

^6{X"+i©?} 

with reused random variables and {sf)^^^. Let A and B be the events 

A := {3i G IMo : cij = ToAi A max'I'(ro, d) < Co A max d) < c^} 

d^S (iGS',To<t<T 


and 


B := {xf^^ = XtA = VtA = Lfit G {ToAt,..., TAt - 1}}. 

It is assumed that there exist At, No, cq and Cs such that Assumption [I] is applicable and 

T{A\B) = 0. 

The modified PSO represents the actual PSO, but the dimensions with low logarithmic potential 
are completely removed. The event A appears if a stagnation phase starts at time ToAt and has 
not ended at time TAt and S is the set of stagnating dimensions. The event B appears if the 
reduced process and the actual process are equal at times t such that Tq < t < T for all dimensions 
not in S. The assumption states that almost surely A implies B, which means that this modified 
PSO behaves similar to the actual PSO if a stagnation phase starts at time Tq and all dimensions 
in S are stagnating. Therefore stagnating dimensions have no effect on the other dimensions, but 
dimensions not in S have effect on the stagnating dimensions because the times when attractors 
are updated are determined by dimensions not in S. 

In the following a short motivation is presented why this assumption is reasonable. The potential 
used in this paper represents the impact of a single dimension to the change in the function value in 
a single step. The probability that a dimension can change the evaluation whether a new position 
is a local or global attractor can therefore approximately be bounded by Pqj. stagnation 

phases with c^ = —20 this value is approximately 0.000 001 and therefore the expected waiting time 
that a dimension with that low potential has effect on the behavior of the swarm is approximately 
bounded below by 1000 000 iterations. Furthermore, for good stagnation phases the logarithmic 
potential is expected to decrease linearly on average. Neglecting the randomness the logarithmic 
potential at time t can be bounded by cq — fi{t — ToAt), where —fi is the negative expectation of the 
potential per time step, and therefore the expected number of time steps such that this dimension 
has effect on a decision whether a new position is a local or global attractor after the beginning of 
an unlimited stagnation phase is approximately bounded by 

POO 1 

V < 2 '^° / exp(- ln{2)^ii)di = 2 '^° 

*>0 -^0 

which is a finite value. Furthermore, cq can be chosen such that this value is even less than one. 
Also changes in dimensions with low logarithmic potential can not accumulate over time, because 
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if updates in local and global attractors happen quite often, then accumulation can not appear and 
if the updates occur not quite often, then changes will not accumulate as well because the positions 
of the particles will mainly stay between the local and the global attractors. 

The last assumption states that dimensions, which have low logarithmic potential, have no 
effect on the swarm. This means that those dimensions receive a series of decision results whether 
a position is a new local or global attractor. Those decision results influence how the positions in 
dimensions with low logarithmic potential develop. 

In this paper the behavior of the PSO for large time scales is analyzed. Therefore it is necessary 
to define in which cases convergence is present. 

Definition 11 (Swarm convergence). A swarm is said to converge if the coordinates of the global 
and local attractors and the positions converge to the same point and the velocities of the particles 
converge to zero. The limit of the positions is also called point of convergence. 

If the velocity tends exponentially to zero then convergence of attractors and positions also 
appears. 

If there is a stagnation phase which has no end then there is a non empty subset of dimensions 
and the dimensions in this set have low logarithmic potential as soon as the last stagnation phase 
starts. Assumptionj^states that there exists an alternative PSO without that dimensions. Although 
these dimensions have no effect on the remaining dimensions, these dimensions are still changing. 
The dimensions with low logarithmic potential receive a series of updates of the local and global 
attractors from the reduced process. If convergence appears, some additional conclusions can be 
made. Depending on the update series and the random variables for the dimensions with low 
logarithmic potential, some final value for that dimensions is reached. As it is not plausible that 
the limit of the coordinates for that dimensions is fixed at the start of the last stagnation phase, it 
is assumed that there is no limit value of these coordinates, which has positive probability. Neither 
optimal values nor any other value. 

Therefore the following assumption is concluded. 

Assumption 3 (Insignificance of stagnating dimensions (2)). For fixed objects At, ‘h, Nq, cq, Cg, 
pL, M and po let Xq, Vq and Lq be initialized such that it represents a possible start of a good 
{At,^, Nq, co,Cs, n, M,pQ)-stagnation phase, let T j and Tb be the related probability distributions 
of this stagnation phase specified in Assumption^^ and let 

S C {d G {!,...,£>} I T(0,d) < Co} 
such that |5| = Nq. Let S be the event 

£ '■= { sup T(t, d) < Cg A the swarm converges}. 
t>o,deS 

It is assumed that there exist At, Nq, cq, Cg, fa, M and po such that Assumption^^is applicable 
and 

P( lim = u I T) = 0 for all n ^ {1,..., N}, u G IR and d € S. 

t^OO 

This assumption states that if the PSO reaches a stagnation phase which does not end and the 
PSO converges, then it is almost surely true that specihc points will not be reached, because at 
least the coordinates of the insignificant dimensions will differ almost surely. This implies that if 
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the set of local optima is countable then almost surely none of the local optima will be reached, 
because the set of possible coordinate values of optimal points is countable for each dimension and 
therefore the dimensions with low logarithmic potential will not reach any of those values almost 
surely. Therefore stagnation occurs in those cases. 

Assumption is not restricted to composite functions. The behavior for functions which are 
not composite is more complicated, but the effect is the same. An aspect which will not be proved 
is that dimensions which become relevant from time to time are optimizing. If the coordinates in 
dimensions with low logarithmic potential are converging to a non-optimal value then the optimal 
value for the coordinates, which are optimizing, is probably not the same as in the case where all 
dimensions are optimizing. For example the function f{x, y) = 2{x + y)‘^+ {x — y)‘^ = 3x^ + 2x?/+3?/^ 
is minimal for a; = 0 and y = 0, but if x tends to 1 then /(l,y) = 3y^ + 2y + 3 is minimal for 
y = — I 7 ^ 0. Nevertheless, the effect is the same for functions, which are not composite, because 
the influence of the dimensions with low logarithmic potential is heavily delayed. The scales of the 
change in stagnating dimensions and non-stagnating dimensions are very different and therefore 
the change in the optimal position is recognized by the optimizing dimensions later in the process, 
when the stagnating dimensions do not have the possibility to change the optimal position and 
their position in a comparable manner. 

For example if the logarithmic potential of a dimension d stays smaller than —100 then (for the 
specihed experimental potential and tested functions) also the velocities in dimension d are by a 
factor of smaller than the velocities in the dimensions with most potential. If an attractor 

is updated, then its d-th position entry changes a bit. This change might result in a minor change 
of the optimal position for all other dimensions if the position value in dimension d is regarded as 
a constant value, i. e. let fdxi ,..., x^-i, x^+i, ■ ■ ■, xd) '■= fixi ,..., xd), let Cgid be the previous 
value of the d-th position entry and let Cnew be the new value of the d-th position entry, then 
the optimal position of fc^^^ and xtiscy vary a bit. Currently the attractor choice does hardly 
depend on that change, but if the swarm finally converges, then the velocity in dimensions with most 
logarithmic potential finally reaches the same scale as the previous change of the optimal position 
where the position value in dimension d is regarded as a constant. The velocity in dimension d has 
also decreased and is again much smaller. Therefore it cannot undo the changes done many steps 
in the past, where the velocity in dimension d had the same scale as the dimensions with large 
logarithmic potential now have. This is meant as the heavily delayed influence of dimensions with 
low logarithmic potential. As the movement in dimension d was at no time guided to the optimal 
point in this dimension, it is almost impossible that this dimension is optimizing. 

With Assumption the main theorem of this paper can be proved. 

Theorem 2. Let ^ he a potential, cq < Cs < 0, At > 0, Nq > 0 6e constants such that the 
Assumptions and are applicable. Furthermore, let T > 0, y < 0, M < oo and po > 0 be 
constants. Let Ax^i ■= {/3i-i < oo} for all i. If the following conditions hold: 

• The swarm converges almost surely, 

• the objective function f has only a countable number of local optima and 

• for all i the expectation E[Xj | Axd < T f/P(Ax,i) > 0, 

then the swarm converges to a point, which is not a local optimum, almost surely. 


24 


Proof. If the final phase which has type PHp appears, then the PSO will not converge to a local 
optimum for reasons explained below Assumption!^ Now all which is needed to be shown is that a 
phase of type PHp appears almost surely. This is true if it can be shown that the expected value 
of Tj?, the starting time of the phase of type PHp, is finite. 

Let Ax,i := {A-i < oo}, the event that the i’th instance of a phase which has type PHx appears, 
let Ay^i := {/3i < oo}, the event that the Pth instance of a phase which has type PHy appears 
and let As^i := {oti < oo}, the event that the i’th instance of a good (At, <h, Nq, cq, Cg, /r, M, 
Po)-stagnation phase appears. Let Nx be sup{i G IN : P(Ax,i) > 0} and let Ny be sup{i G IN : 
P(Ay^j) > 0}. Both values can be infinity. Let be the random variable, which represents the 
starting subset of the Pth good (At, <1>, Nq, cq, Cg, /i, M,po)-stagnation phase or the empty set if this 


phase does not occur. Let It^d,i be f^ , + ^ ^ , similar as in the proof of Theorem 


such that 


f B,i and tj^i are the random probability distributions of the Pth good (At, $, Nq, cq, Cg, /r, M,pq)- 
stagnation phase. The random probability distributions are introduced in Assumption If the 
Pth good (At, A^O) Co, Cs, M,po)-stagnation phase does not exist then TB,i is defined to be a 
Dirac impulse on /x and Tj^j is defined to be a Dirac impulse on zero. Additionally, let A be the 
generated u-Algebra of the family of u-Algebras {PltligN, where At is the natural filtration of the 
PSO as specihed in Definition This ci-algebra is needed because for any time t it is not possible 
to decide whether a stagnation phase, which is active at time t has type PHp or PHy. Therefore 
Ay^i is not necessarily contained in At for any t. The conditional expectation of Xi, the duration 
of the Pth phase of type PHx, is by the requirements of this theorem bounded by 


E[Xi I Ax,*] < T 

if the event Ax,i has positive probability. Let Ts^t be the random variable, such that T^,* • At repre¬ 
sents the time of the start of the Pth stagnation phase, which has either type PHy or PHp. If the 
i’th stagnation phase does not occur then Ts^t is set to infinity. As already specified, ty represents 
the duration of the i’th phase of type PHy if that phase exists and zero otherwise. Let p be de¬ 
fined as p{Nq, fjL, M,pq), the lower bound of the probability that a good (At, <h, Nq, cq , Cg, p, M,pq)- 
stagnation phase does not end from Theorem 

Nx Ny 

^[Tp] = ^ P[Ax,i)P[Xi I Ax,*] + ^ P(Ay,i)E[y* I Ay,i] 

Nx Ny 

= Y, P(^x,*)E[Ai I Ax,*] + P(A5,*)E[ty I A 5 ,*] 

compare with end of stagnation phase (Definition 

Nx Ny OO 

= XlP(-4^,.)E|V.|*lx.d + EP(As,)j; tAt ■ 

2=0 2 = 0 t = l 

•P(l{(i G 5o,* : sup '^{Ts^i + i,d) < cj] > Nq 

0<i<t 

Al{d G So,* : sup ^{Ts,i +i,d) < cj] < A 0 IA 5 ,*) 

0<i<t 
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Nx Ny oo 

< Y.{i-pyE[x,\Ax,] + J2i^-pyY.*At- 

i=0 i=0 t=l 

•P(|{(i G S'o,* : sup ^ < 0}| > iVo A |{(i G S'o.i : sup ^ If Ai < 0}\ < No\As,i) 

0<i<t t'=Ts,i 0<i<t t'=Ts,i 

OO Ny OO ^S,i+^ 

< E(i-P)‘r+E(i-P)-E tAt ■ F{3d G S'o,i : sup ^ IfAi < 0 A ^ 
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□ 

This is the final statement of the theoretical part. The proof of Theoremnot only shows that 
the final phase with type PHp appears almost surely, but it also verifies that the expected begin 
of the final phase is a finite value. Now evidence is provided in the experimental part, that those 
conditions truly occur, and therefore that PSO does not converge to a local optimum almost surely 
for some well known benchmarks and parameters of the PSO, which are commonly used. 
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4 Experimental Results 


Since we are interested in the behavior of the swarm for f —>■ oo, we let the swarm do a very 
large number of iterations. During the ongoing process, the absolute value of the velocities and 
the change in the objective function tend to zero. Therefore calculations with double precision 
are not sufficiently precise. Instead, we used the mpf_t data type of the GNU Multiple Precision 
Arithmetic Library, which supplies arbitrary precision. Initially we start with a precision of 2000 
bits as significant precision of the mantissa and increase the precision on demand, i.e., on every 
addition and subtraction we perform a check whether the current precision needs to be increased. 
The constants of PSO are assigned to values that are commonly used, i.e., x = 0.72984 and 
Cl = C 2 = 1.496172, as proposed in [4]. The classical PSO is used as specified in Definition]^ 
Additionally, the PSO is visualized as pseudo code in Algorithm]^ No bound handling procedure 
is used. Particles move through the space without any borders. We checked our results with some 
benchmarks of [19] and an additional function. In detail, we investigate the sphere function, the 
high conditioned elliptic Function and Schwefel’s problem (see m for detailed problem description). 
Additionally, the diagonal function is added, where its second derivation matrix has a single heavy 
eigenvalue and the corresponding eigenvector is oriented diagonally. 

Definition 12 (Objective functions). The formal definitions of the used objective functions are 

• fsph{x) := Xi, the sphere function, 

• fhce{x) '■= the high conditioned elliptic function, 

• fsch{x) := Schwefel’s problem and 

• fdiagix) ■■= + 10® • (Eiil diagonal function. 

The optimal position for all of these functions is the origin and the optimal value is zero. 

4.1 Used Software 

For arbitrary precision the mpfit data type of the GNU Multiple Precision Arithmetic Library with 
version 4.3.2 is used. The program, which simulates PSO is implemented in C++ and uses the 
mpfit data type (program code is enclosed). The program runs on openSUSE 12.3 and uses the 
GNU Gompiler Gollection with version 4.3.3. For creation of diagrams plot2svg of Jiirg Schwizer 
within Matlab R2013b and InkScape 0.48 is used. 

4.2 Expectation of increments for various scenarios 

This section represents a motivation for stagnation phases with a specific number of stagnating 
dimensions. Firstly, some typical runs of the PSO are presented, which illustrate that for a com¬ 
bination of function and number of particles there is always a fixed number of dimensions, which 
are not stagnating. Secondly, the expectation of the increments are visualized to support this 
impression. 

Figurej^shows a typical run of the PSO on the sphere function with three particles and 10 dimen¬ 
sions. All positions are initialized uniformly at random in the 10-dimensional cube [—100.0, 100.0]^®. 
Additionally, zero velocity initialization is used, which means that all entries of all velocity vectors 
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(c) Logarithm of distance to optimum 

Figure 4: (a) Potential, (b) logarithmic potential and (c) distance to optimnm with potential 
as specified in Definition on sphere function in 10 dimensions and with 3 particles; each line 
represents a single dimension, usual initialization 
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Figure 5: (a) Potential, (b) logarithmic potential and (c) distance to optimnm with potential 
as specified in Definition on sphere function in 10 dimensions and with 3 particles; each line 
represents a single dimension, initialization with Algorithm 
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are initialized with the value zero. Other velocity initializations have also been investigated, but 
the results are the same. The initialization of positions uniformly at random in a cube and with 
zero velocity will be referred to as usual initialization, because this is a common initialization for 
PSO algorithms. The Figure contains the logarithm of the potential, the logarithmic potential 
and the logarithm of the distance to the origin, which is the only optimal point. For some periods of 
time there are some dimensions which have low logarithmic potential and for the same dimensions 
the logarithm of the distance to the origin stays quite constant. If one of those dimensions regains 
potential then all dimensions regain potential, but dimensions which already have low logarithmic 
potential do increase mnch slower than the others. Therefore their logarithmic potential decreases. 
In Figure [^approximately at time steps 40 000 and 75 000 a dimension, which had lower logarithmic 
potential than other dimensions, regains potential. At the same times the logarithmic potential 
of the remaining dimensions, which already had low logarithmic potential, significantly decreases. 
While some dimensions increase their potential in dimensions, which had large logarithmic poten¬ 
tial previously, the respective positions may become worse. Finally, there are three dimensions 
which have a significantly smaller logarithmic potential than the other dimensions. The other 7 
dimensions are optimizing. There are always short periods of time, where one of the 7 dimensions 
has a quite constant difference to the origin, as you can see in Figure]^ These periods can be that 
large that one of the three dimensions with low logarithmic potential regains potential, but as the 
logarithmic potential of the lower three dimensions continuonsly decreases, it becomes more and 
more unlikely that dimensions change from low logarithmic potential to high logarithmic potential. 
An interesting fact is that the number of optimizing dimensions in this experiments stay the same, 
even if the number of dimensions of the domain changes. For the sphere function and three parti¬ 
cles finally max(0, D — 7) dimensions appear, which have significantly smaller logarithmic potential 
than the other dimensions. 

Figure illustrates another run of the PSO on the sphere function with three particles and 
10 dimensions. For this run the positions are initialized as specified in Algorithm This 

algorithm first initializes all D dimensions as previously described uniformly at random in the 
interval [—100.0,100.0]. The dimensions d* to d*+L—l are initialized with some random center (Y G 
[—100, 0,100.0]) and some random noise in the range [—100.0 • 2“'^, 100.0 • 2“'^]. This initialization 
simulates stagnation of L consecutive dimensions with initial logarithmic potential of approximately 
—S. The chosen parameters for the run visualized in Fignre [^ are N = 3 particles, 74 = 10 
dimensions, a scale of S' = 200, L = 9 initial stagnating dimensions and d* = 1 the index of the 
first stagnating dimension. 

Another option would have been scaling all position values of the first L dimensions with 2~^, 
but this is problematic. On the one hand the value of the initial potential then is mainly much 
less than —S, because not only the velocities are reduced, but also the derivative is smaller in the 
neighborhood of the optimum. The reduction of the derivative cannot be controlled with some 
general formula, because it depends on the objective function. On the other hand the logarithmic 
potential used in this paper on functions which are not composite, like fsch and fdiag-, acts not 
as proposed in the assumptions of Section [^ for this initialization. This is due to the fact that 
the derivative of those objective functions in stagnating dimensions depends also on the non¬ 
stagnating dimensions. If it changes which part overweights the other then also the distribution 
of the increments changes significantly during a stagnation phase. This change appears for the 
functions fsch and fdiag when the positions of non-stagnating dimensions reach values in the interval 
[-100.0-2-'^, 100.0-2-'^]. 
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Algorithm 2: Special position initialization 
input : number of particles N, number of dimensions D, scale S, number of dimensions 
with low logarithmic potential L, first dimension with low logarithmic potential d* 
output: a vector of initial positions for each particle X G (IR^) 


1 

2 

3 

4 

5 


/* rand(a,6) supplies a uniform random value in 
for d := 1 to D do 
for n := 1 to A do 
I rand(-100.0, 100.0); 

end 
end 


[a,b] 


*/ 


6 

7 

8 
9 

10 

11 


for d := d* to d* + L — 1 do 
V := rand(-100.0, 100.0); 
for n := 1 to A do 
I X[n][d]:=X[n][d]-2-^+ Y- 

end 

end 


12 return X ; 


Example 1. The function f{x,y) = + {x + y)'^, the Schwefel’s problem in two dimensions, is 

analyzed. The first derivative equals (4x + 2y, 2x + 2y). If y = —2, then the optimal value for x is 
1, because then the first entry of the first derivative is zero. If x = 100 + 1, then all entries of the 
first derivate are mainly determined by the value ofx. If the difference of x to the optimal value is 
decreased by a factor of 100 to x = 1 + 1, then the first derivative is also decreased approximately 
by a factor of 100. If the difference of x to the optimal value is again decreased by a factor of 100 
to x = 0.01 + 1, then the first entry of the first derivative is also decrease by a factor of 100, but the 
second entry stays quite constant. Therefore the second entry of the first derivative is now mainly 
determined by y. The potential recognizes this change, too. 

If the positions of the stagnating dimension would be initialized small compared to the other 
dimensions, then the position values of the stagnating dimensions stay constant while the position 
values of non-stagnating dimensions tend to their optimal value. Initially the complete first deriva¬ 
tive is determined by the dimensions with larger position values. In the beginning the absolute 
values of all entries of the first derivative are decreasing, but finally the entries for stagnating di¬ 
mensions of the first derivative stay constant. Therefore the behavior changes significantly if this 
border is crossed. For sure such situations may appear after initialization or during a PSO run, 
but it is very unlikely that the particles are initialized in that way if standard initializations are 
used and it is very unlikely that stagnating dimensions become much more optimized than non¬ 
stagnating dimensions during a PSO run. The objective functions fsph and fhce are not facing this 
problem, because the derivative of a single dimension depends only on the dimension itself. 

For the PSO run in Figure 9 of 10 dimensions are initialized as stagnating dimensions with 
logarithmic potential —200. This figure illustrates that if less than seven dimensions are non¬ 
stagnating, then the logarithmic potential of stagnating dimensions is increasing. The smaller 
the number of non-stagnating dimensions is, the larger is the increase of the logarithmic potential. 
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Finally, one of the stagnating dimensions becomes significant. This results in a running phase in that 
dimension. Running phases are introduced in |16[ fT71118] . If the swarm is running in a dimension 
d, then the coordinate value of dimension d determines the local attractor and the influence of the 
other dimensions can be neglected. All velocities in direction d are either all positive or all negative, 
the local attractors are updated each step and the global attractor is updated at least once each 
iteration. Dimension d is heavily improved during this phase. The running phase terminates as 
soon as walking in dimension d does not lead to a further improvement of the function value. During 
this running phase, the previously non-stagnating dimensions and the running dimension regain 
potential. The other dimensions also experience an increase in potential, but this increase is much 
smaller compared to the running dimension and the non-stagnating dimensions. Therefore the 
logarithmic potential decreases. This behavior repeats until seven dimensions are non-stagnating. 
The largest running phase in Figure [^occurs approximately at step 19 000. 

The PSO runs visualized in Figures and do not look quite linear as proposed. This is due 
to the fact that they do not show continuous stagnation phases. Figure [^represents the extended 
run of Figure]^ with hundred times more iterations. The linear drift of the logarithmic potential 
can easily be recognized in this figure. Furthermore, the non-stagnating dimensions can hardly be 
distinguished because their variance is not larger than in less number of iterations. As the scale 
has increased, the variance looks smaller. 

Definition 13. For an evaluated random variable of the r ’th test run, we write the original random 
variable in single tilted brackets with index r. For example {A)r is written for the evaluated random 
variable with name A in test run r. 


Experiment 1. The first statistical analysis measures expected alteration of the logarithm of the 
potential and the logarithmic potential for various scenarios. All functions are evaluated with D = 
10 dimensions and 100 000 iterations. The stagnating dimensions, the number of particles N and 
the objective functions vary. Algorithm is used for position initialization and all velocities are 
initially set to zero. The initial logarithmic potential is adjusted, such that for no test run the 
logarithmic potential of any stagnating dimension reaches a value of —100 or more within the 
100 000 iterations. Each tested configuration is started R = 500 times with different seeds. Let Ds 
be the set of stagnating dimensions, which is fixed for each specific configuration. Let Tm be 50 000 
and let Te be 100 000, the number of half and full time steps of the evaluated process. 

The following estimators are defined: 


hu 


hM 


Td 


1 

R{Te - Tm) 


R 


E 



max 

d^Ds 



1 

R{Te - Tm) 


R 


E 



( min 

\d^Ds 



1 

R-lDsKTe 


Tm) 


R 


EE 

r=l d£Ds 






log 



Tl '-—Td ~ Tu- 


As linearity is available only in logarithmic scale, the logarithm of the potential $ and the log¬ 
arithmic potential T is used. RR represents the average decrease or increase of the logarithm of 
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Figure 6: (a) Potential, (b) logarithmic potential and (c) distance to optimum with potential 
as specified in Definition on sphere function in 10 dimensions and with 3 particles; each line 
represents a single dimension, usual initialization 
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the potential of the most significant dimension per time step, JIm represents the average decrease 
or increase of the logarithm of the potential of the least significant dimension among the initially 
non-stagnating dimensions per time step, JTo represents the average decrease or increase of the 
logarithm of the potential of stagnating dimensions per time step and JIZ represents the average 
decrease or increase of the logarithmic potential of the stagnating dimensions per time step. It is 
assumed that the hrst half of iterations is sufficient for adequate mixing. Then the second half is 
used for average calculation. Additionally the squared standard deviations are estimated for JLjj, 
JTB and JIZ by 
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and au := y gm '■= \J'■= \J'■= \J The variance will be analyzed in Section 
4.4 in detail. In Figurej^the measured decrease of the logarithm of the potential for non-stagnating 


dimension JIJJ, for stagnating dimensions JIR, the measured change in logarithmic potential JIZ and 
their measured standard deviations are visualized for the sphere function fsph, N = 3 particles, 
44 = 10 dimensions and for different numbers of stagnating dimensions L. The decrease of the 
logarithm of the potential for non-stagnating dimensions JIZ becomes smaller as the number of 
non-stagnating dimensions increases. This is due to the fact that, the more dimensions need to be 
optimized, the more time is needed for optimization. The second columns represent the decrease 
of the logarithm of the potential for stagnating dimensions JIZ), which also becomes smaller, but 
not that fast as JIZ- The third columns, which represent the change of logarithmic potential JIZ, 
is heavily positive for only one non-stagnating dimension, but it decreases while the number of 
non-stagnating dimensions increases until finally the measured change becomes negative. If less 
than 7 dimensions are non-stagnating and further stagnating dimensions are available, then the 
logarithmic potential is likely to increase until at least one of the stagnating dimensions becomes 
non-stagnating, because JIZ is positive in those cases. 

Figures and visualize the expectations of the change in logarithmic potential per time step 
for various scenarios. An obvious attribute of both graphs is that all functions perform similar 
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number of stagnating dimensions L 


Figure 7: Measured expectations nu, and and their standard deviations for sphere function 
fsphj N = 3 particles, D = 10 dimensions, d* = 1 and variable number of stagnating dimensions L 


if only one dimension is non-stagnating (L = 9). All functions mentioned in this paper have a 
constant second derivative, which is a positive dehnite matrix, and third and higher derivatives 
are zero. If only a single dimension is not stagnating then the remaining function is represented 
by a parabola with some scale. Remaining function is meant as the function, which remains after 
all values of stagnating dimensions are replaced with the constants that represents their current 
position. The scale of the parabola does not influence the behavior of the PSO, because it only 
evaluates whether a position is better or not, which is not dependent on the scale. For sure the 

scale of a parabola has linear influence on <1>, but the logarithmic potential is not affected, because 

this scale is compensated by the subtraction in Definition 

Furthermore, the sphere function fgph and the high conditioned elliptic function f^ce have quite 
similar measured expectations. The high conditioned elliptic function actually is a sphere function 
scaled along the axes with different constant scales. It seems that the PSO does not care if such 
scaling is applied to the sphere function. 

Other functions can have different behavior. For the Schwefel’s problem it even depends on which 
dimensions are stagnating. To explain this property, examining the matrix representation of the 
mentioned objective functions is helpful: 

• fsphix) = X* • Asph ■ X, with {Asph)i,j = l\ii = j and 0 else, 

• fhce{x) = X* • Ahce ' X, with {Ahce)i,j = (I0‘’)o-i if i = j and 0 else, 

• fschix) = X* • Asch ■ X, with {Asch)i,j = D + 1 - min(i, j) and 

• /dmg(x) — X • A^iag ' X, with — 10 -|- (^Agph^ij, 
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number of stagnating dimensions L 


Figure 8: Measured expectations of logarithmic potential and their standard deviations for var¬ 
ious functions, N = 2 particles, D = 10 dimensions and variable number of stagnating dimensions 
L 


where represents the transposed vector of x. Important are the directions of the eigenvectors 
of those matrices. All eigenvectors have only real values, because the matrices are symmetric 
and positive semidefinite, which always is a property of second derivative matrices of objective 
functions, which are two times continuously differentiable, at local minima. For sphere function 
fsph and high conditioned elliptic function ff^ce there exists an eigenvector basis, which is parallel 
to the coordinate axes. This is the main reason for their similar behavior, because particles prefer 
to walk along the axes as proposed in HZl In contrast, we have the diagonal function fdiag with 
matrix A^iag, which has a single large eigenvalue {D ■ 10® + 1) and the corresponding eigenvector 
is oriented diagonally to all axes, i. e., the eigenvector is represented by (1,1,..., 1)* G IR^. The 
PSO has only bad performance on this function. For 2, 3 and 4 particles only two non-stagnating 
dimensions are necessary to generate considerable decreasing logarithmic potential, while for the 
sphere function and the high conditioned elliptic function there are 3 non-stagnating dimensions 
if 2 particles are available and there are 7 non-stagnating dimensions if 3 particles are available 
until the measured expectation of logarithmic potential becomes negative. For 4 or more particles 
operating on the sphere function or the high conditioned elliptic function it is even not known 
whether there exists a finite bound for the number of non-stagnating dimensions, such that the 
expectation of logarithmic potential becomes negative. As the logarithmic potential of the diagonal 
function fdiag is already negative for two non-stagnating dimensions, only the values for L = 9 and 
L = 8 are visualized in Figures and Negative expectation of logarithmic potential means that 
stagnating dimension prefer to decrease their logarithmic potential and therefore are more likely to 
stay stagnating. 
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number of stagnating dimensions L 


Figure 9: Measured expectations of logarithmic potential and their standard deviations for var¬ 
ious functions, N = 3 particles, D = 10 dimensions and variable number of stagnating dimensions 
L 


If some dimensions are temporarily stagnating then the functions appear like 

(x — x*y ■ A ■ (x — X*) + const, 

where all rows and columns of stagnating dimensions are removed and x* is the current optimal 
point for the remaining dimensions. For composite functions this optimal point stays at the origin. 
For other functions this is not necessarily the case. For the sphere function fgph and the diagonal 
function fdiag the remaining submatrix does not depend on which dimensions are stagnating, but 
only on the number of stagnating dimensions. The submatrix of the high conditioned elliptic 
function fhce depends on which dimensions are stagnating, but the remaining matrix is always 
a diagonal matrix and therefore the PSO always performs similar to the sphere function. This 
implies that for this objective function it also only depends on how many dimensions are currently 
stagnating. 

For the Schwefel’s problem it heavily depends on which dimensions are stagnating. E.g., two 
very different submatrices are the matrices where the first L dimensions are stagnating and where 
the last L dimensions are stagnating. The effect on the potential is visualized in the Figures 


and 10 If the first L dimensions are removed, then the remaining matrix is equal to the respective 
matrix of the Schwefel’s problem m. D — L dimensions. If the last L dimensions are removed, 
then the remaining matrix is equal to the respective matrix of the Schwefel’s problem in D — L 
dimensions plus the value L for each entry for each entry of the remaining matrix. This is similar 
to the matrix of the diagonal function, where a value of 10® is added to each entry of the matrix 
of the sphere function. The more dimensions at the end are removed, the more intense becomes a 
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number of stagnating dimensions L 


Figure 10: Measured expectations ^jj, and fiD and their standard deviations for Schwefel’s 
problem fsch^ N = 3 particles, H = 10 dimensions and variable number of stagnating dimensions 
L 


mostly diagonal eigenvector. Therefore the PSO performs much worse on the Schwefel’s problem 
if the last L dimensions are stagnating than if the first L dimensions are stagnating. In Figures 
and lathis can be observed, because the change in logarithmic potential is much lower if the last L 
dimensions are stagnating. 

Figure 10 even shows that the final number of stagnating dimensions can differ if different 
dimensions are stagnating. It shows the measured expectations of the change in potential of the 
largest potential value, the lowest potential value of the initial non-stagnating dimensions and the 
average of the initial stagnating dimensions. If the number of non-stagnating dimensions is small 
then Jlm is similar to Jlij, which means that the D — L initially non-stagnating dimensions stay 
non-stagnating. Finally the value of JTm is similar to Jld, which means that at least one of the 
initially non-stagnating dimensions becomes stagnating. Actually this change should happen as 
soon as Jlij is already larger than Jlf) for fewer non-stagnating dimensions, because then there 
exist a dimension such that if this dimension becomes also stagnating the logarithmic potential of 
stagnating dimensions is more likely to decrease. In Figure [T0| 7^17 is larger than JTd for d* = 1 and 
L = 5. As expected pM is similar to JTd for d* = 1 and L < 5. For d* = D — L + I this is not 
that clear. JljJ is slightly larger than JlJd for L = 6, but as the resulting decrease of logarithmic 
potential is very small, it may happen through random fluctuation that all initially non-stagnating 
dimensions regain potential from time to time. Therefore Jlm only slightly differs from JliJ for 
L = 5. If the number of iterations is heavily increased, then it will be more likely that one of the 
initial non-stagnating dimensions becomes stagnating. 
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Detailed results on the exact values of /if/, ^m, I^d and /iL of Experiment are listed i 
Appendix 


in 


4.3 Verification of the Assumption of Separation 


In this section evidence is provided for the separation of the increments It^d into the two parts 
Bi^Tb tq Jt,d,TjTQ-: proposed in Assumption The following experiment supplies information 
about the PSO during stagnation phases. 


Experiment 2. All functions are evaluated with D = 100 dimensions and 200 000 iterations. The 
number of particles N and the objective functions vary. Algorithm\^ is used for position initializa¬ 
tion and all velocities are initially set to zero. L, the number of initially stagnating dimensions, 
and d*, the index of the first stagnating dimension, are chosen, such that the expectation of the in¬ 
crements of stagnating dimensions are just becoming negative. S, the initial scale of the stagnating 
dimensions, is set to 500. For no test run the logarithmic potential of any stagnating dimension 
reaches a value of —100 or more within the 200 000 iterations. Each tested configuration is started 
R = 500 times with different seeds. Let Ds be the set of stagnating dimensions, which is fixed for 
each specific configuration. Let Tm be 100 000 and let he 200 000, the number of half and full 
time steps of the evaluated process. At is set to one for this experiment. 

The number of dimensions is taken that large, such that it can be assumed that Bt^rs tq > 
base increment of all stagnating dimensions, can be approximated by the average of all increments 
of stagnating dimensions. The initialization with Algorithm and specific parameters L and d* 
leads to a fixed set of stagnating dimensions. As discussed earlier, the set of stagnating dimensions 
fixes the expectation of the increments, but the set of stagnating dimensions also fixes the distri¬ 
butions of the increments, because the remaining function is always the same and therefore the 
PSO performs similar. Therefore it is assumed that Tb,To T constant distributions for 
a single scenario. For reasons of readability, the constant distributions are removed from the index 
of the estimaters for B and J. The first half of the iterations is again used for sufficient mixing. 
With this experiment large sums of increments are investigated. Therefore At is set to one for this 
experiment, because larger values of At are implicitly shown with larger numbers of accumulated 
increments. Furthermore no scaling of the time axis is existent with this value of At. The following 
estimaters for B and J in the r’th run and time step t are used: 

{Wr-.= Y 1 i^d)r/\Ds\ 

d&Ds 


{Jt,d)r ■ — {Lt,d)r {Bt)r 

Additionally the following helping variables for sums of r values are defined: 
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The estimators for the expectation are calculated over the complete second half to receive a reliable 
value as in the previous experiment: 


1 


R Te-l 


IJ-I ■= ■ = 


R ■ (Te — r„ 


EET 


t/r 


r=l t=Tm 

As proposed in Assumption the measured expectation of J is zero: 
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A further estimator is introduced to measure the covariance (cov) of the sums J2t"=T'^ ^ 
^^T™ 
l^t= 


t=Tm 


Ji 


t.d' 
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I r=l deDs 

If B and J are independent then their covariance needs to be zero. Actually the estimator covb,j ,7 
for this covariance always evaluates to zero, because 
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Furthermore, this idea can be expanded. We use 
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as approximation on the base increment Bt- This conditional expectation represents the average in¬ 
crement of stagnating dimensions, after the movement in non-stagnating dimensions is determined, 
i. e., and the random variables of the movement equations defined in Definition 

[^for non-stagnating dimensions, and all information about previous steps are known. Even for the 
sum of T increments 
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is a possible approximation on Y^=tZ~^ Jt,d- Other choices for A[ and A^^ are also possible, 
but the inclusions C C At+i and At^ C Alj^^ C At^+t need to be satisfied to grant the 
specified measurability of Assumption With similar calculations as with the estimaters, the 
actual expectation of Jt and the covariance of random variables Bt and Jt is zero, if the proposed 
approximations on random variables B are accepted. As presented in Example the derivative 
in stagnating dimensions becomes quite constant if the coordinate values of stagnating dimensions 
become constant. For Figuresandit is significant that even temporarily stagnating dimensions 
do not change their position values significantly while they are stagnating. As the logarithm of 
the potential of all dimensions tends to minus infinity, as presented in Figure 6a, the absolute 
values of the velocities in all dimensions need to tend to zero. Therefore the potential in stagnating 
dimensions can be calculated by the multiplication of the absolute value of the velocity in that 
dimension and the approximately constant derivative in that dimension. As the increments are 
calculated by the difference of logarithmic potential values, only the portion of the velocity and the 
portion of the dimension with maximal potential remains. Hence the increments of all stagnating 
dimensions are only determined by the velocities in that dimension. The portion of the dimension 
with maximal potential is completely included in the base increment and therefore the dimension 
dependent part J of the increments is only determined by the change of the velocity in the specific 
dimension. The update of global and local attractors appear simultaneously for each dimension. 
Hence all dimension dependent parts J have quite the same distribution and are quite independent 
of each other, because the velocity change is determined only by the updates of the attractors and 
the two random variables of the movement equations, which belong to that specific dimension. 
That is the reason for averaging the expectation of random variables of type J and the covariance 
of random variables B and J over all stagnating dimensions. Therefore //j, the actual expectation 
for a random variable J, is calculated with the previously described approximation by the following 
equations: 
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The variable Jt^d is replaced by It^d — Bt and then the approximation on Bt is applied. 
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Additivity of the expected value is applied. In the second part then only u-algebra At^ remains, 
because it is a sub-u-algebra of ATp , i. e., At^ ^ ATj^ . 
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Additionally covb,j,t^ the actual covariance of random variables B and J, is calculated by 
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All variables Jt^d s-iid Bt are replaced by their approximation. 
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The two nested conditional expectations in the second line can be combined and only At^ 
mains, because is a sub-a-algebra of At^- Eurthermore the outmost conditional expectation 
is replaced by two conditional expectations, where the inner cr-algebra is a sub-fi-algebra of 

■^Tm- 


■|^5| 


Ee 

deDs 


E 


E 


Ll^sl 


Tm+r-l 
d'^Dg t=Ttn 


Al 


-E 


Ll^sl 


Tm-Hr-l 

hd' 


d'GDs t=T„ 


At„ 


Tm+T-1 


t - Tjy 


Tm+r-l 






Aq 


At„ 


d'£Dg t—Tm 

The second outmost conditional expectation is applied. The first factor is a constant under this 
u-algebra. Therefore nothing happens there. In the second factor the conditional expectation only 
remains for the sum over It^d- 
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The sum J2d&Ds ™ front of the complete expression is shifted into the conditional expectation. 
The hrst factor does not depend on d. Therefore it can be applied directly on the second factor. 
On the one hand the sum is put inside the first conditional expectation of the second factor. On 
the other hand the second conditional expectation does not depend on d and therefore it is summed 
up 11 ) 51 times, which means that this conditional expectation is effectively multiplied by 
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Admittedly, this calculations can also be done for any other set of random variables, which does not 
belong to a stagnation phase at all, but then the described simplification cannot be justified. As all 
stagnating dimensions act similar and can be permuted without any effect, also the covariances of 
Jt^d for different values of d are zero. Therefore we have that there definitely exist random variables 
B and J, which have no linear dependency. Furthermore with the chosen approximation for vari¬ 
ables base increment and dimension dependent increment we can expect sufficient independence 
for specific times or time intervals. 


4.4 Experimental Verification of the Assumption of Independence 

The previous section presented reasons for the separation of the increments into independent base in¬ 
crements and dimension dependent increments. This section experimentally shows that increments, 
which belong to different periods of time, fulfill sufficient independence to accept the applicability 
of the theorems in Chapter 

First some additional estimators for the variance and the sixth central moment related to Ex¬ 
periment are introduced. 
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Due to limited resources only the sphere function fgp^ with N = 2 and N = 3 particles, the high 
conditioned elliptic function fhce with N = 2 and N = 3 particles and the Schwefel’s problem fsch 
with N = 3 particles and two different sets of stagnating dimensions are investigated. As presented 
in Figures and the first negative value for the expectation of the increments is available 
with L = D — 3 = 97 stagnating dimensions with two particles and sphere or high conditioned 
elliptic function, L = D — 7 = 93 stagnating dimensions with three particles and sphere or high 
conditioned elliptic function and L = D — 5 = 95 stagnating dimensions and d* = 1 with three 
particles and Schwefel’s problem. If d* is not equal to one then the remaining function of the 
Schwefel’s problem depends heavily on the number of dimensions. For D = 10 dimensions the 
suitable values are d* = 5 and L = 6, but for D = 100 dimensions already d* = 4 and L = 97 are 
the appropriate values. The measured variances are visualized in FigureIf the increments would 
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Figure 11: Estimaters for the variance of (a) increments (b) base increments cr^ ^ and (c) 

dimension dependent increments resulting from Experiment each line represents a single 
conhguration 
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be completely independent then the resulting graphs would be just lines. As already discussed, the 
distribution of the increments depends heavily on the number of non-stagnating dimensions. In 
Figure]^ at approximately steps 40 000 to 75 000 there are temporary four dimensions, which do 
not improve their position, but for the sphere function and three particles there generally are seven 
non-stagnating dimensions. Therefore with I? = 10 dimensions only three stagnating dimensions 
normally appear, because otherwise the expectation of the increments of stagnating dimensions 
is then positive. The additional temporary stagnating dimension has a slightly lower logarithmic 
potential than the other non-stagnating dimensions, which can be recognized in Figure [4b| Hence 
the logarithmic potential of the three obviously stagnating dimensions have a positive drift during 
this period of time with a temporary additional stagnating dimension. The temporary stagnating 
dimension also receives that positive drift, but as this is a random process, the logarithmic potential 
of this dimension can remain small for some time. At the end of periods of time with temporary 
additional stagnating dimensions the PSO becomes running in the temporary stagnating dimension 
and the other stagnating dimensions quickly lose logarithmic potential. Such periods appear quite 
frequently during PSO runs and their lengths vary. The fewer the difference between the expectation 
of the increments with current number of stagnating dimensions and with one dimension fewer is, 
the longer those periods can become. The heavier increase of the measured variances in Figure 
[m results from those periods, because during or at the end of those periods the behavior differs. 


Over larger periods this effect become less important. Also Figure 11 supports that impression. 


After the heavy increase at the beginning, the variances grow quite linearly. For the sphere and 
high conditioned elliptic function with two particles, there is not even a recognizable increase 
at the beginning, because the difference of the expectation of the increments differ strongly for 
L = D — 3 and L = D — 2 dimensions (compare with Figure]^ and therefore the lengths of periods 
with temporary additional stagnating dimensions are very small. Altogether increments may be 
independent enough in time when large time scales are used. 

Taking a deeper look into the proof of Lemma [^results in the awareness that the independence 
in t is used three times. It is used to guarantee that. 


< c ■ t 

plus drift or the dimension dependent increment plus drift 


where C is some constant and I is either the base increment 


there is a positive probability for finite times that the sum Ii stays negative and 


• the probability to receive a negative sum h larger or equal to the case when we 

already know that the sum was negative for smaller values of t. 

First the sixth central moment is investigated. For this purpose the previously defined estimaters for 
the sixth central moment are visualized in Figure [l^ The estimaters are divided by r^, because the 
relation to or respectively should be analyzed. With complete independence Figures 12a and 
12b should appear similar to Figure |12ct which illustrates that for all scenarios the sixth central 
moment of the dimension dependent increment divided by oscillates around some constant 
value, as it is intended to. But similar to the variance also the sixth central moment faces a 
heavy increase in the beginning, which results from periods with temporary additional stagnating 


dimensions. Nevertheless, also in Figures 12a and 12b it seems that the central sixth moment can 


are more 


be bounded with some polynomial of degree three, because the estimaters divided by r'* 
or less decreasing or constant, but not increasing. Note that all theorems in this paper could be 
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(c) Estimater for the sixth central moment of dimension dependent increments 


Figure 12: Estimaters for the sixth central moment of (a) increments Mf (b) base increments 

^ and (c) dimension dependent increments resulting from Experiment divided by r^; 
each line represents a single configuration 
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adjusted if only constants C > 0 and e > 0 exist, such that ~ < C ■ ^ can be 

fulfilled. 

The other two properties look quite reasonable. 

For finite times the example runs show that there is a possibility that the increments and also 
the base increments and the dimension dependent increments sum up to negative values. Animated 
by the experimental runs, a precise PSO run can be constructed such that the increments, the base 
increments and the dimension dependent increments sum up to negative values. This precise run 
can be extended such that for each position some noise is possible, i. e., it is permitted that each 
reached position is at most e apart from the intended position, e can be chosen that small so that 
decisions whether a new position is a local or global optimum do not change. The probability to 
receive one of those runs is then positive for any finite time. 

Also the property, that the probability to receive a negative sum is greater or equal, if we 
already know that the sum was negative before, is reasonable for the PSO. That is because mostly 
the previous sums are even much smaller than zero and therefore the probability to stay negative, 
when the next increment is added, is almost one, while it is not that likely to be negative if the 
previous sum is already positive. 

Therefore the assumption of independence was introduced, to generate provable theorems. 


4.5 Brownian Motion as Approximation for Accumulated Sums 


After Theorem it is proposed that Brownian Motions can be used as approximations on partial 
sums of increments. In this section measured probabilities are compared with probabilities received 
from the approximation with Brownian Motion. 

The probability that a stagnation phase does not end is determined by the probability that sums 
of increments of initial stagnating dimension stay below some bound. Actual maximal values cannot 
be measured, because we cannot run experiments for an infinite number of iterations. Therefore 
Experiment is used for this purpose. For each initial stagnating dimension the maximal sum of 
the increments is calculated by 

t-i 

fmax,d •— max \ '' It'd 
Trr,<t<n 


and (4max,d)r is written for the evaluation on the r’th run. For sure 100 000 iterations are far from 
infinite iterations, but on the one hand an infinite number of iterations cannot be simulated and on 
the other hand, as we have negative expectation of the increments, the largest values of the sums 
most likely appear in the beginning of the process. For the approximation with Brownian Motions 
the expectation and the variance of the increments are needed. For this purpose the already defined 
estimaters JTZ and o\ ^ are used. As the variance does not grow linearly from the beginning, the 
following bounds for the variance are used: 


(j2 .— 

m av • 


max Ur inn .^/(lOO • r) 
l<r<1000 blOO-r/V ) 


^min •— ('^1,100 000 '^7,10 OOo) /90 000 

The upper bound is the maximal measured variance per iteration. The variances are calcu¬ 
lated only each lOO’th step. The lower bound represents the approximately linear increase, 
neglecting the heavy increase in the beginning. After the proof of Theorem it is proposed that 
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1 — exp( 2 (c<j — co)/u/cj^) is an approximation on the probability that the maximal sum of a single 
dimension stays below cq — C 5 . Accordingly, 1 — exp (—2 • x • /r/cr^) can be assumed to be an approxi¬ 
mation on the cumulative distribution function for /max,d- In Figure [T^ the cumulative distribution 
functions, generated by the measured values JTZ, o'max nnd uAjj, i- e., 


and 

and the empirical cumulative distribution function, i. e.. 


R 


Fempix) : — 

^ (-^max,d)r ’ 

■ I r=l d£Ds 

are visualized for two different scenarios. The two scenarios are visualized exemplarily, because 




(a) Cumulative distribution function for the sphere (b) Cumulative distribution function for the sphere 
function with two particles function with three particles 

Figure 13: Cumulative distribution functions, generated by the measured values JIl, cr^ax 

and the empirical cumulative distribution function for (a) the sphere function with two particles 

and (b) the sphere function with three particles. 


for other scenarios, the respective graphs look very similar. The empirical cumulative distribu¬ 
tion functions mainly proceed between the two cumulative distribution functions generated by the 
measnred expectations and variances. There is some positive probability, that /max,d is 0. For the 
visnalized scenarios this probability is quite small, as it is hardly recognizable that the empirical 
cumulative distribution function starts with some positive value. For small positive values it may 
appear that the empirical cumulative distribution functions are evaluated to lower values than the 
lower cnmulative distribution function Fh 2 . Furthermore, the more the value of the cumulative 
distribution functions tend to 1 , the more likely the empirical cumulative distribution functions 
are evaluated to larger values than the larger cumulative function F ^2 . Both events occur in 

min 

only supplies positive values for the probability that /max,d 
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stays small enough so that stagnation phases do not end, but this approximations even supply 
probabilities, which tend to 1 if the value for the maximal sum is enlarged. The allowed value for 
the maximal sum in the model used in this paper is Cs — cq, because sums after the beginning of a 
stagnation phase need to stay below this value. The larger this difference is chosen, the larger is the 
probability that a stagnation phase never ends, at least for stagnation phases with one stagnating 
dimension. As more stagnating dimensions are connected with the base increment, it is more likely 
that the value for the maximal sum stays below some bound, if it is already known that for other 
dimensions this maximal sum already stays below the same bound. Therefore it is assumed that 
(1 — exp(2(c<j — co)/u/cj^))'^° is a lower bound for the probability that the maximal sum of any stag¬ 
nating dimension stays below Cg — co, which implies that a stagnation phase does not end. The more 
stagnation phases occur during a single run, the more likely some of the stagnating dimensions, 
which do not cause the end of stagnation phases, have very small logarithmic potential. Those 
dimensions mainly can be neglected for calculations on the probability whether a stagnation phase 
ends or not. Therefore the more stagnation phases occur, the more likely those phases remain 
through infinity. The probability increases to values of the case with a single stagnating dimension 
if Nq — 1 dimensions already are that insignificant. 

4.6 Duration of Phases of Type PHx 

In this section it is discussed, why the requirement of Theorem that phases of type PHx have 
finite expectation, is acceptable. Afterwards distributions of durations of phases of type PHx are 
visualized. 

In general the event that no stagnation phase is active means that at least the largest D — 
Nq dimensions have quite similar potential, i. e., the absolute difference of the logarithm of their 
potentials is at most |co| = —cq. Through the movement equations it is almost always possible, 
that the absolute value of the velocity in most dimensions can be decreased significantly, which 
leads to a significantly decreasing potential. This is even possible for D — 1 dimensions. Therefore 
it is reasonable, that there exist some positive probability p and a finite number of iterations T, 
such that the probability that the logarithmic potentials of at least A^o dimensions decrease to 
values less than cq within T iterations is at least p. Then the waiting time for the event that the 
logarithmic potential of at least Nq dimensions decrease to values less than cq is bounded above 
by a geometrical distribution, which has the finite expectation T/p. Therefore the expectation of 
the waiting time for the next stagnation phase is also bounded by T/p, but this stagnation phase 
do not necessarily achieve the limits p, M or po specified in Definition The choices of the values 
Nq, p, M, and pq are heavily dependent on the scenario. 

For instance the sphere function fgph or the high conditioned elliptic function fhce with two 
particles can be analyzed. As already discussed, it only matters how many dimensions are stag¬ 
nating. Therefore all stagnation phases are good stagnation phases with equal parameters p, M, 
and Pq if A^o = D — 3 dimensions are stagnating. If more dimensions than Nq are stagnating, 
then the expectation of the increments is positive, which finally lead to fewer number of stagnating 
dimensions. This leads to the appraisal that T/p is also a bound for the expected waiting time to 
the next good stagnation phase for this scenario. 

In contrast, the Schwefel’s problem fsch is not that easy to handle, because the behavior of 
the increments depends on the set of stagnating dimensions. To guarantee suitable values for Nq, 
p, M and pQ, most subsets of the set of dimensions need to be analyzed, whether this set may 
appear as stagnating dimensions or not. As there are 2^ many subsets, this cannot be analyzed 
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with acceptable time required for computing if D becomes large. Nevertheless Nq can be chosen a 
little bit lower so that additional, not only temporary, stagnating dimensions are allowed. Without 
knowing the exact values, // and M can be chosen as the maximal value over all good stagnation 
phases and po can be chosen as the minimal value over all good stagnation phases. Also the 
experiments show that stagnation appears for Schwefel’s problem as well. 

For all analyzed scenarios the parameters Nq, cq, Cg, p, M and po can be chosen, such that all 
observed stagnation phases are good stagnation phases, because the smaller cq is, i.e., the larger 
the absolute value of cq is, the more unlikely are stagnation phases with positive expectation of 
increments. In particular, this implies a* = dj and /3j = /3j for every i, which is assumed from now 
on. 

The following experiment supplies data for actual stagnation phases, because a usual initializa¬ 
tion is used. 


Experiment 3. All functions are evaluated with 500 000 iterations. The number of particles N, 
the number of dimensions D and the objective functions vary. Usual initialization is used, i. e., 
Algorithm^ can be used for position initialization, but L, the number of initially stagnating dimen¬ 
sions, is set to zero, and all velocities are initially set to zero. Each tested configuration is started 
R = 500 times with different seeds. Let Tg be 500 000, the number of time steps of the evaluated 
process. At is set to 100. 

The following abbreviations are defined: 


Dx,i := {r e {1,..., /?} : i = 0 V (/3i-i)r < Te] 


represents the set of runs such that the i’th phase of type PHx has started within the tested 
number of iterations. 

DY,i := {r e {1 ,...,/?} : {ai)r < Te} 

represents the set of runs such that the i’th phase of type PHy or PHp has started within the 
tested number of iterations. For all tested scenarios Dx,i equals Dy^i, which means that no test 
run of any tested scenario ended in a phase of type PHx- \Dx,o\ = R = 500, because each test run 
starts with a phase of type PHx- Furthermore, 


Fx,i{x) ■-= 


^{Xi)r<X 

\Dx,i\ 


represents the empirical distribution of the phase lengths Xj. Figure 14 shows this empirical 
distribution, received from Experiment for the durations of a phase of type PHx with the 
sphere function fsph, N = 3 particles, D = 100 dimensions, Nq = D — 7 = 93, cq = —40 and 
Cg = —20. This configuration is chosen exemplarily. For all configurations the durations of a phase 
of type PHx are largest for the initial phase of type PHx, which starts at iteration 0. At the 
beginning, all dimensions have quite the same potential. Therefore the logarithmic potential of 
all dimensions are not that far from zero. This implies that Nq dimensions need to decrease their 
potential to cq = —40 or lower. After the end of a stagnation phase it is only known, that from 
the set of initially stagnating dimensions less than Nq dimensions remain, which have logarithmic 
potential less than Cg = —20 since the start of the stagnation phase. Nevertheless there can be 
either other dimensions, which now have already low logarithmic potential, i. e., less than cq or Cg. 
Also there might be dimensions, which did not cause the end of the stagnation phase and still have 
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Figure 14: Empirical distribution of phase lengths Xi with sphere function, N = 3 particles, 
D = 100 dimensions, Nq = 93, cq = —40 and Cg = —20 


low logarithmic potential. Additionally, if the logarithmic potential of a dimension is larger than 
Cg, then that logarithmic potential is not almost zero. Instead it most likely is just a little bit larger 
than Cg. Altogether, after stagnation phases there are still dimensions with much lower logarithmic 
potential than they most likely have immediately after the initialization of the process. Therefore 
the data is conform with this theoretical thoughts. For this large number of stagnating dimensions 
Nq = 97 it is also reasonable, that the graphs of the empirical distributions move even further to 
the left, because it becomes more and more likely that some of the stagnating dimensions are much 
smaller than cq and therefore do not delay the start of the next stagnation phase. Further data on 
the existence of phases of type PHx and some statistical values can be found in Appendix [B) 

4.7 Swarm Converges 

In this section it is discussed why the swarm converges almost surely for the tested scenarios. 

On the one hand the chosen parameters y = 0.72984 and ci = C 2 = 1.496172, as proposed in [3], 
are known to be good, because rough theoretical analysis shows, that convergence can be expected. 

On the other hand the experiments supply also information, which lead to the assumption of 
convergence. As assumed, the logarithmic potential of stagnating dimensions tends to minus infinity 
with linear drift. Also the logarithm of the potential of all dimensions tend to minus infinity with 
linear drift for all recently mentioned scenarios. The derivative for a stagnating dimension di finally 
stays quite constant. Therefore the potential in dimension di is approximately the absolute value 
of the velocity in that dimension multiplied by that constant: 

^{t,di)^ max ■ const. 

The logarithm of the potential is then the logarithm of the absolute value of the velocity plus the 
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logarithm of a constant value: 

log(<l>(t, di)) Ri log ( max +\og{const) 

ne{l,...,N} 

This implies that also the logarithm of the absolute values of the velocities of stagnating dimensions 
tends to minus infinity with linear drift. For a non-stagnating dimension d 2 the first derivative is 
approximately zero, but the second derivatives of all used functions are positive definite matrices. 
Therefore it can be assumed that the potential of non-stagnating dimensions can be approximated 
by the square of the velocity multiplied by a constant; 

max ■ const. 

n£{l,...,N} 

The logarithm of the potential is then two times the logarithm of the absolute value of the velocity 
plus the logarithm of a constant value: 

log(<h(t, (^ 2 )) ~ 2 • log ( max -|- log{const) 

ne{l,...,N} 

This implies that also the logarithm of the absolute value of the velocities of non-stagnating di¬ 
mensions tend to minus infinity with linear drift. 

Therefore the absolute values of the velocities can be approximately bounded by vq ■ c* 
with positive constants uq G IR and c g] 0, 1[. The maximal distance, which can be covered by a 
particle beginning at some time T, is then bounded by 

00 00 J' 

E t T ST^ t Vo ■ c 

vo-c =vo-c 

t=T t=0 

This bound is a finite value which tends to zero if T tends to infinity. Therefore the positions of the 
particles will tend to constant positions. This position is indeed the limit of the global attractor, 
because otherwise the velocities of a particle will stay approximately as large as the difference from 
current position to global attractor and will not tend to zero. 

Another point of view is that all investigated functions are strictly convex, i. e., 

f{a-x + {l- a)y) < a ■ f{x) + (I - a)f{y) 

for all a g] 0, 1[ and x, y G R^, x ^ y. Usually the positions are initialized in a closed set. Therefore 
the maximal function value for the initial global attractor f{G\) is bounded. As all investigated 
functions can be written as x^ ■ A ■ x, with positive definite matrix A, the set of points with lower 
function values than the worst possible initial global attractor is bounded too. With the theorem 
of Bolzano-WeierstraB we receive that (Gj)tgiHo contains convergent subsequences. If there is more 
than one accumulation point, then the function value of all accumulation points needs to be equal, 
because the functions are continuous and therefore points with a larger function value than the 
lowest function value of any accumulation point cannot remain as accumulation points. But if 
there are different accumulation points, then the particles almost surely visit the area between the 
accumulation points. As the investigated functions are strictly convex, the points between two 
points with equal function value have strictly lower function values than the function values of 
the accumulation points. After one of those points in between is visited, the global attractor will 
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change to that point and will never return to the surroundings of the intended accumulation points, 
because the function values in this area are larger than the encountered value of the current point. 
Therefore the sequence of the global attractor can have at most one accumulation point almost 
surely. The theorem of Bolzano-Weierstrafi guarantees, that there is at least one accumulation 
point. This implies that there is exactly one accumulation point Gum almost surely. If the global 
attractor does not converge to that accumulation point, then there exists an e > 0 such that the 
set I := {t G iKlo : ||G| — Gum\\} does not have a finite number of elements. With the theorem 
of Bolzano-Weierstrafi also a subsequence of the sequence only containing indexes available in I 
has at least one accumulation point. This accumulation point cannot be equal to Gum, but this 
is a contradiction to the statement that there is only a single accumulation point. Therefore 
the complete sequence of the global attractors converges almost surely to the limit point Gum- 
Indeed that limit point is not deterministic. Also the local attractors need to converge to the 
same limit point almost surely, because otherwise the same contradictions appear as with multiple 
accumulation points of the global attractor. With usual parameter choices then also the positions 
will converge to the limit point of the global attractor and the velocities will tend to zero. 

Nevertheless the property, that a swarm will converge almost surely, is dependent on the objec¬ 
tive function, the number of particles, the number of dimensions and the parameters of the PSO 
and needs to be checked if Theorem should be applied. 


4.8 Empirical Distributions of Stopping Times a* and /3j and of Logarithmic 
Potential 

This section provides additional data concerning Experiment!^ First, empirical distributions of the 
stopping times a* and /3j are provided for two different scenarios. Afterwards, empirical distributions 
of the logarithmic potential at the end of the test runs are visualized. 

The empirical distributions of the stopping times ai and f3i are defined by the following expres¬ 
sions: 

1 ^ 


r=l 


R 


r=l 


Figure 15 shows the empirical distributions for the configuration with sphere function, N = 3 
particles, D = 100 dimensions, Nq = 93, cq = —40 and Cg = —20 and Figureshows the empirical 
distributions for the configuration with sphere function, N = 3 particles, D = 8 dimensions, A^o = 1; 
Co = —40 and Cg = —20. The duration of the first phase of type PHx takes longer with more 
dimensions. Also it is more unlikely that stagnation phases remain. Especially in case oi D = 100 
dimensions in the beginning most stagnation phases are quite short, because in 93 dimensions the 
logarithmic potential needs to stay small. For D = 8 only the logarithmic potential of a single 
dimension needs to stay small and therefore the probability that a stagnation phase remains is 
much higher. For D = 100 it becomes also more and more likely that stagnation phases remain, 
because there are more and more dimensions, which have a logarithmic potential which remains 
far below cq. This can also be observed in the Figures [15] and 16 The leftmost line represents the 
empirical distribution of ao, which equals the length of the first phase of type PHx- For D = 100 
the first empirical distributions of the stopping times a* and /3j reach the value 1 very early. For 
D = 8 the empirical distribution of /3o, which is represented by the second left line, indicates the 
end of the first stagnation phase. In contrast to the previous scenario, this empirical distribution 
does not reach the value 1. That means that the first stagnation phase does not end within the 
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Figure 15: Empirical distribution of the stopping times ai and /?* with sphere function, 

N = 3 particles, D = 8 dimensions, Nq = 1, cq = — 40 and Cg = — 20 




Figure 16: Empirical distribution of the stopping times ccj and /3i {Fa,i, F^^i) with sphere function, 
N = 3 particles, D = 100 dimensions, Nq = 93, cq = —40 and Cg = —20 
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tested iterations. A further observation is that phases of type PHx have quite short duration 
compared with phases of type PHy- Already in Figures 15a and 16a the empirical distributions of 
and have only a small distance. In Figures 15b and 16b they are merging to a single 

line already. 

Another indication for the existence of unlimited stagnation phases is the empirical distribution 
of the final value of the logarithmic potential. 


1 


R 




r=l 




To receive more meaningful figures, not the empirical distributions for single dimensions are visual¬ 
ized, but the empirical distributions for the d’th largest dimension is visualized. Yl!d'=i 
counts how many dimensions of run r have a final logarithmic potential of less or equal than x. 
A single run has then only an effect if at least d dimensions with final logarithmic potential of 
less or equal than x appear. Figures [T7| and 18 show the respective empirical distributions for two 


scenarios. 



Figure 17: Empirical distribution of the final value of the logarithmic potential with sphere 
function, N = 2 particles and D = 100 dimensions 


As already proposed, the scenario with the sphere function and two particles, which is displayed 
in Figure experiences always D — 3 stagnating dimensions. The three dimensions with largest 
logarithmic values at the end significantly differ from the other dimensions. Their logarithmic 
potential is not that far from zero. In contrast the logarithmic potential of the remaining 97 
dimensions is always smaller than —500 and in most cases much smaller than this value. This small 
value means that the potential of the fourth largest dimension is ~ 10“^®^ times smaller 

than the potential of the largest dimension. The minimal logarithmic potential value of the third 
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largest dimension is approximately —90 and the minimal logarithmic potential value of the second 
largest dimension is approximately —30. Admittedly, these values are also very small compared to 
the largest dimension, but, as already discussed, dimensions can become temporarily stagnating. 
Therefore small logarithmic potential values can appear even for the dimension with second largest 
potential. Nevertheless, the difference from the third to the fourth largest logarithmic potential 
value is huge. In contrast the empirical distributions of the dimensions with lower logarithmic 
potential are quite similar. They only move a little bit to the left. 



Figure 18: Empirical distribution of the final value of the logarithmic potential ^ with Schwefel’s 
problem, N = 3 particles and D = 100 dimensions 


For the Schwefel’s problem the number of actual stagnating dimensions depends on which 
dimensions are stagnating and which are not. There are sets with three non-stagnating dimensions 
such that the expectation of the increments becomes negative, but there are also sets with five non¬ 
stagnating dimension such that, if any of the dimensions is removed from the set of non-stagnating 
dimensions, then the expectation of the increments becomes positive. Figure 18 represents the 
empirical distributions of the Schwefel’s problem with three particles and 100 dimensions. There it 
can be observed that the empirical distribution of the fourth and fifth largest logarithmic potential 
value do neither always belong to the set of stagnating dimensions nor they always belong to the 
set of non-stagnating dimensions. Instead there is some positive probability that the dimension, 
which refers to the fourth or the fifth largest logarithmic potential value, belongs to the set of 
non-stagnating dimension. This is illustrated by the heavy increase of the empirical distribution 
in the near surroundings of the potential value zero. For the remaining cases the fourth or the 
fifth largest logarithmic potential value behave similar to the other stagnating dimensions. It is 
also reasonable that the fourth or the hfth largest logarithmic potential value behave not identical, 
because the distribution of the increments depends on the set of stagnating dimensions. For sixth 
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and further dimensions all stagnation phases are merged, while the fifth largest potential only 
counts as stagnating dimension if four or fewer dimensions are non-stagnating. The cases with 
exactly five non-stagnating dimensions is removed for this dimension. 

4.9 Long Time Experiments 

Further more time consuming experiments are done with larger number of particles. Beware, if 
the number of steps is doubled, then the runtime sometimes increases by a factor of four, because 
the longer the experiment takes the more increases the precision. This is due to the event that the 
longer an experiment takes the more diverse become the values. Also the number of dimensions 
influences the runtime more than linearly. To grant no irregularities the precision needs to be 
increased, which is executed automatically on demand by the used software. As this experiments 
are analyzed for 100 000 000 iterations only single runs are executed. 

For example the configuration with four particles and the sphere function was under investiga¬ 
tion. With H = 40 dimensions and 100 000 000 iterations it appears that after a long period of time 
the smallest occurring logarithmic potential among the dimensions stays quite constant. Indeed 
the dimension, which is responsible for the lowest value of the logarithmic potential changes over 
time. The value for the lowest logarithmic potential T(t, d) stays approximately in the range —1000 
to —2 000. Periods of time with temporary stagnating dimensions can even last up to 10 000 000 
iterations. However the largest overall potential $(100 000 000,4) has decreased to approximately 
2-8 000 ^ It even appears that each dimension becomes significant from time to time, which leads 
to the impression that four particles finally will succeed to optimize the sphere function with 40 
dimensions. The absolute values of the global attractor for all dimensions at the end are in the 
range from to which strengthens this impression. 

With D = 400 dimensions and 100 000 000 iterations it cannot yet be determined, whether 
there are actual stagnating dimensions or not. In fact the value for the lowest logarithmic potential 
4'(t, 4) reaches approximately —10 000, but no significant separation of logarithmic potential of dif¬ 
ferent dimensions can be observed, which would indicate stagnating and non-stagnating dimensions. 
Some dimensions are stagnating almost since the beginning of that test run and have very small 
logarithmic potential, but other dimensions regained logarithmic potential after very large number 
of iterations. The maximal potential value of the last iterations has not decreased that extremely. 
Potential values $(t, 4) of up to approximately are reached in the last 10% of the measured 
iterations. The absolute values of the global attractor for all dimensions at the end are in the range 
from 2“^^^ to 2^. This means that the position of the global attractor in some dimensions at the 
end is still in the range of the initialized positions. Surely this looks like that there are permanent 
stagnating dimensions, but, as periods of time with temporary stagnating dimensions can take even 
more iterations than in the case with 40 dimensions, no prediction can be made, whether there are 
stagnating dimensions in this case or not. On the one hand a larger number of iterations needs 
to be tested and on the other hand more than a single test run needs to be executed. This is not 
quite easy because already this single test run took some months of computing time, because the 
final needed precision is more than 10 000 bits, which cause very time consuming calculations. 

Therefore it is not quite clear, whether there is a limit of dimensions such that the PSO with 
chosen parameters and four particles finally stagnates at non-optimal points, or not. 
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5 Conclusion 


Altogether a theoretical base was introduced to formalize stagnation during runs of the PSO. For 
some functions evidence was provided, that the chosen theoretical model is applicable. Especially 
for the sphere function and its scaled version, the high conditioned elliptic function, it was shown 
that with two particles there are at most three dimensions, which do not stagnate permanently. 
All other dimensions are finally stagnating. For three particles the number of dimensions, which do 
not stagnate permanently, is increased to seven. For three particles, and PSO parameters as used 
in this paper, the sphere function and eight dimensions there is always a dimension, which finally 
stops optimizing. For other functions the number of stagnating dimensions can be significantly 
different from the configuration with the sphere function. An example is the diagonal function. 
Even with four particles there are only two dimensions, which do not stagnate permanently. 

Furthermore, the presented framework is not only applicable to the presented version of the 
PSO. Everything can work also with different parameters for the movement equations or even with 
changed rules for movement or updates. For example, the global attractor could be updated only 
each iteration and not after each particle. Also other iteration based optimization algorithms can 
be analyzed if a suitable potential can be defined. 

On the one hand it can be analyzed whether optimization algorithms can find local optima 
for specified functions, or it may appear that there are always some dimensions, such that the 
coordinates in this dimension are not optimized. On the other hand if convergence is present and 
if comparable dehnitions for a potential can be made, for example Definition for PSO algorithms, 
then the speed of convergence can be compared by comparison of the increments of logarithmic 
potentials. Also the number of stagnating dimensions can be a comparison criterion. 

In this paper only functions are analyzed, which do not change their shape, no matter how the 
coordinates are scaled. For example if there are two vectors G IR^ then f{a ■ x) < f{a ■ y) 

< 0 ?f{y) f{x) < f{y) for any a G IR \ {0}. The established framework can also handle 
more complicated functions, for instance functions with bounded third derivatives. For each local 
optimum those functions can be approximated by their Taylor approximation of degree two. If 
the second derivative matrix is not only positive semidefinite but also positive dehnite, then this 
approximation is sufficient to determine whether the PSO finally stops to tend to that local optimum 
or not. If the PSO running on the approximation almost surely stagnates and does not end up in 
the single optimum, then the PSO will stagnate if it tends to that local optimum in the original 
function, because changes in higher derivatives become less important the nearer the particles are 
positioned around the local optimum. In contrast if for all local optima stagnation is impossible 
almost surely for the Taylor approximation, then it can also stagnate before, when higher derivatives 
are still significant. 

Additionally if stagnation is not present then convergence to local optima is not granted. At 
least not with the recent analysis. Nevertheless it is expected that if stagnation is not available, 
then each dimension reaches large logarithmic potential from time to time. If dimensions have 
large logarithmic potential, then it is assumed that these dimensions optimize their positions. In 
this paper stagnation is defined for some minimal number of stagnating dimensions Nq, which is 
appropriate to grant not reaching a local optimum. To grant convergence to local optima for at least 
D — Nq dimensions, stagnation phases need to be defined differently with a maximal (not minimal) 
number of stagnating dimensions. This positive result can not be extended by only analyzing the 
derivatives of local optima, because earlier stagnation can be available. 
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Nevertheless if there is no positive value of Nq, the minimal number of stagnating dimensions, 
such that stagnation is observed, then it is expected that convergence to local optima is present. 

Final continuing questions in this context, which could be analyzed with the given framework, 
are the following: Is there a number of particles, such that it can be granted that at least the sphere 
function, continuous functions or differentiable functions will be optimized perfectly almost surely? 
Is there a fixed number of particles such that it can be granted that at least the sphere function, 
continuous functions or differentiable functions will be optimized perfectly almost surely for some 
bounded number of dimensions? Is there a minimal or maximal number of stagnating dimensions 
for fixed number of particles, fixed PSO parameters and a set of objective functions? 


59 



References 

[1] H. Bauer. Probability Theory, volume 23 of Studies in Mathematics. De Gruyter, 1996. 

[2] P. Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics: 
Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. A Wiley- 
Interscience Publication. 

[3] A. Carlisle and G. Dozier. An off-the-shelf PSO. In Proc. Particle Swarm Optimization 
Workshop, pages 1-6, 2001. 

[4] M. Clerc and J. Kennedy. The particle swarm - explosion, stability, and convergence in a 
multidimensional complex space. IEEE Transactions on Evolutionary Computation, 6:58-73, 
2002. 

[5] R. Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Proba¬ 
bilistic Mathematics. Cambridge University Press, 2010. 

[6] R. C. Eberhart and J. Kennedy. A new optimizer using particle swarm theory. In Proc. 6th 
International Symposium on Micro Machine and Human Science, pages 39-43, 1995. 

[7] M. Jiang, Y. P. Luo, and S. Y. Yang. Particle swarm optimization - stochastic trajectory 
analysis and parameter selection. In F. T. S. Chan and M. K. Tiwari, editors. Swarm Intelli¬ 
gence - Pocus on Ant and Particle Swarm Optimization, chapter 17, pages 179-198. I-TECH 
Education and Publishing, Vienna, 2007. 

[8] M. Jiang, Y. P. Luo, and S. Y. Yang. Stochastic convergence analysis and parameter selection 
of the standard particle swarm optimization algorithm. Inf. Process. Lett., 102:8-16, 2007. 

[9] J. Kennedy and R. C. Eberhart. Particle swarm optimization. In Proc. IEEE International 
Conference on Neural Networks, volume 4, pages 1942-1948, 1995. 

[10] P. K. Lehre and C. Witt. Finite first hitting time versus stochastic convergence in particle 
swarm optimisation. In L. Di Gaspero, A. Schaerf, and T. Stiitzle, editors. Advances in 
Metaheuristics, OR/CS, pages 1-20. Springer ScienceJ-Business Media, New York, USA, 2013. 

[11] J. E. Onwunalu and L. J. Durlofsky. Application of a particle swarm optimization algorithm 
for determining optimum well location and type. Computational Geosciences, 14:183-198, 
2010. 

[12] B. K. Panigrahi, Y. Shi, and M.-H. Lim, editors. Handbook of Swarm Intelligence — Concepts, 
Principles and Applications. Springer, 2011. 

[13] K. Ramanathan, V. M. Periasamy, M. Pushpavanam, and U. Natarajan. Particle swarm 
optimisation of hardness in nickel diamond electro composites. Archives of Computational 
Materials Science and Surface Engineering, 1:232-236, 2009. 

[14] T. H. Scheike. A boundary-crossing result for the Brownian Motion. Journal of Applied 
Probability, 29(2), 1992. 


60 



[15] B. I. Schmitt. Convergence Analysis for Particle Swarm Optimization. FAU University Press, 
Erlangen, 2015. Doctoral Thesis. 

[16] M. Schmitt and R. Wanka. Particle swarm optimization almost surely finds local optima. In 
Proc. 15th Genetic and Evolutionary Computation Conference (GECCO), pages 1629-1636, 
2013. 

[17] M. Schmitt and R. Wanka. Particles prefer walking along the axes: Experimental insights 
into the behavior of a particle swarm. In Companion of Proc. 15th Genetic and Evolutionary 
Computation Conference (GECCO), pages 17-18, 2013. 

[18] M. Schmitt and R. Wanka. Particle swarm optimization almost surely finds local optima. 
Theoretical Computer Science, 561:57-72, 2015. 

[19] P. N. Suganthan, N. Hansen, J. J. Liang, K. Deb, Y. P. Chen, A. Auger, and S. Tiwari. 
Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter 
optimization. Technical report, Nanyang Technological University, Singapore, 2005. 

[20] I. C. Trelea. The particle swarm optimization algorithm: Convergence analysis and parameter 
selection. Inf. Process. Lett., 85:317-325, 2003. 

[21] M. P. Wachowiak, R. Smolfkova, Y. Zheng, J. M. Zurada, and A. S. Elmaghraby. An approach 
to multimodal biomedical image registration utilizing particle swarm optimization. IEEE 
Transactions on Evolutionary Computation, 8:289-301, 2004. 


61 



A Data of Experiment 


Table 1: Values of estimaters specified in Experiment 1 for the sphere function fgph 


/ 

N 

L 

d* 

d-u 

dM 


Ml 

fsph 

2 

9 

1 

-0.280370 

-0.280370 

-0.067542 

0.212828 

fsph 

2 

8 

1 

-0.133841 

-0.133843 

-0.0742897 

0.0595512 

fsph 

2 

7 

1 

-0.0265214 

-0.0265178 

-0.0307508 

-0.0042293 

fsph 

3 

9 

1 

-0.281658 

-0.281658 

-0.0653937 

0.216264 

fsph 

3 

8 

1 

-0.232753 

-0.232748 

-0.0736407 

0.159112 

fsph 

3 

7 

1 

-0.175692 

-0.175695 

-0.0677883 

0.107903 

fsph 

3 

6 

1 

-0.0944085 

-0.094405 

-0.0458445 

0.048564 

fsph 

3 

5 

1 

-0.0436236 

-0.043621 

-0.026277 

0.0173466 

fsph 

3 

4 

1 

-0.0202586 

-0.020237 

-0.0155185 

0.0047401 

fsph 

3 

3 

1 

-0.0074307 

-0.0074679 

-0.0091868 

-0.0017561 


Table 2: Values of estimaters specified in Experiment for the high conditioned elliptic function 

fhce 


/ 

N 

L 

d* 

d-u 

Mm 

Ml) 

Ml 

fhce 

2 

9 

1 

-0.280370 

-0.280370 

-0.067542 

0.212828 

fhce 

2 

8 

1 

-0.133605 

-0.133599 

-0.0742256 

0.0593796 

fhce 

2 

7 

1 

-0.026488 

-0.026485 

-0.0307987 

-0.0043110 

fhce 

3 

9 

1 

-0.281658 

-0.281658 

-0.0653937 

0.216264 

fhce 

3 

8 

1 

-0.232727 

-0.232721 

-0.073618 

0.159109 

fhce 

3 

7 

1 

-0.175414 

-0.175412 

-0.0676028 

0.107811 

fhce 

3 

6 

1 

-0.0941275 

-0.0941295 

-0.0457537 

0.0483738 

fhce 

3 

5 

1 

-0.0436067 

-0.0436075 

-0.0263199 

0.0172867 

fhce 

3 

4 

1 

-0.0201859 

-0.0201904 

-0.0154480 

0.0047379 

fhce 

3 

3 

1 

-0.0074169 

-0.0074541 

-0.0091259 

-0.0017089 
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Table 3: Values of estimaters specified in Experiment for the Schwefel’s problem fsch 


/ 

N 

L 

d* 




J^L 

fsch 

2 

9 

1 

-0.280217 

-0.280217 

-0.0675834 

0.212634 

fsch 

2 

9 

2 

-0.280472 

-0.280472 

-0.0676385 

0.212833 

fsch 

2 

8 

1 

-0.0920762 

-0.0920796 

-0.0622003 

0.0298759 

fsch 

2 

8 

3 

-0.0258683 

-0.0258645 

-0.0449234 

-0.0190551 

fsch 

2 

7 

1 

-0.0052527 

-0.0052403 

-0.0215513 

-0.0162987 

fsch 

2 

7 

4 

-0.0275117 

-0.0446342 

-0.0450705 

-0.0175588 

fsch 

3 

9 

1 

-0.281479 

-0.281479 

-0.0654046 

0.216074 

fsch 

3 

8 

1 

-0.220181 

-0.220177 

-0.0691141 

0.151067 

fsch 

3 

8 

3 

-0.166589 

-0.166591 

-0.0553963 

0.111193 

fsch 

3 

7 

1 

-0.120368 

-0.120371 

-0.0522450 

0.0681226 

fsch 

3 

7 

4 

-0.0585785 

-0.058586 

-0.0338823 

0.0246963 

fsch 

3 

6 

1 

-0.0363988 

-0.036386 

-0.0251476 

0.0112512 

fsch 

3 

6 

5 

-0.0169291 

-0.0169333 

-0.0173492 

-0.0004201 

fsch 

3 

5 

1 

-0.0083174 

-0.0083236 

-0.0120229 

-0.0037055 

fsch 

3 

5 

6 

-0.0080515 

-0.0086733 

-0.0125910 

-0.0045394 

fsch 

3 

4 

1 

-0.0076700 

-0.0116463 

-0.0117998 

-0.0041299 

fsch 

3 

4 

7 

-0.0065029 

-0.0118378 

-0.0116454 

-0.0051425 

fsch 

3 

3 

1 

-0.0071813 

-0.0118641 

-0.0115253 

-0.0043441 

fsch 

3 

3 

8 

-0.0055640 

-0.0114905 

-0.0110172 

-0.0054532 


Table 4: Values of estimaters specified in Experiment for the diagonal function fdiag 


/ 

N 

L 

d* 


d‘M 


T^l 

fdiag 

2 

9 

1 

-0.280374 

-0.280374 

-0.067633 

0.212741 

fdiag 

2 

8 

1 

-6.88 • 10-6 

-6.33 - 10-6 

-0.037121 

-0.037114 

fdiag 

3 

9 

1 

-0.281716 

-0.281716 

-0.065535 

0.216181 

fdiag 

3 

8 

1 

-1.09-10-6 

-1.08 - 10-6 

-0.027709 

-0.027699 

fdiag 

4 

9 

1 

-0.281934 

-0.281934 

-0.064976 

0.216958 

fdiag 

4 

8 

1 

-1.58-10-6 

-1.83 - 10-6 

-0.006380 

-0.006364 
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B Data of Experiment 


The following variables are defined, which indicate minimal, maximal and average values and variance 
observed with Experiment 

min := inf^eD^.i max := sup^g^,^ . (Xi)^, 


X,i 


P‘X,i ■— J2reDx,i \dx"\' J2reDx,i \Dx.i\ 

Table 5: Evaluation of data from Experiment with cq = —40, Cg = —20 and function fgph 


({Xi)r.-fJ.x,i)^ 


N 

D 

No 

\Dx,o\ 

min 

x,o 

max 

x,o 

Aix,o 

^Xfl 

\Dx,i\ 

2 

4 

1 

500 

400 

9 000 

2 017.6 

1244090 

351 

2 

10 

7 

500 

1000 

9 900 

3 239.6 

1 766 472 

460 

2 

100 

97 

500 

1500 

13100 

4 840.2 

2 913124 

493 

3 

8 

1 

500 

700 

7300 

2 597.6 

1 482 594 

488 

3 

20 

13 

500 

2 600 

17900 

6 369.0 

5 246 859 

500 

3 

100 

93 

500 

2 800 

22 300 

10319.4 

9 648 924 

500 

N 

D 

No 


min 

x,i 

max 

x,i 

fJ-x,i 

^X,l 


^X,2\ 

2 

4 

1 

351 

0 

3 600 

862.7 

518 636 

225 

2 

10 

7 

460 

0 

10 500 

1054.3 

1 030 785 

359 

2 

100 

97 

493 

0 

6 900 

1080.5 

773 576 

457 

3 

8 

1 

488 

0 

6 200 

748.0 

715611 

430 

3 

20 

13 

500 

0 

9 700 

1 554.8 

2139 917 

500 

3 

100 

93 

500 

0 

12 800 

2 087.6 

4 274 366 

500 

N 

D 

No 

\Dx,2\ 

min 

X,2 

max 

X,2 

I^X,2 


\Dx,3\ 

2 

4 

1 

225 

0 

5100 

831.1 

752 277 

138 

2 

10 

7 

359 

0 

4100 

763.0 

424394 

256 

2 

100 

97 

457 

0 

6 000 

903.5 

669 835 

365 

3 

8 

1 

430 

0 

4 200 

604.7 

488 537 

348 

3 

20 

13 

500 

0 

13 800 

1231.0 

1885 419 

500 

3 

100 

93 

500 

0 

12 600 

1577.4 

2 411149 

500 
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20 and function fhce 


Table 6; Evaluation of data from Experiment with cq = —40, Cs 


N 

D 

No 

\^x,o\ 

min 

x,o 

max 

X,0 


^x,o 

\Dx,i\ 

2 

4 

1 

500 

400 

6 400 

1948.8 

1016179 

368 

2 

10 

7 

500 

1000 

7800 

3191.4 

1 570 546 

458 

2 

100 

97 

500 

1700 

9 500 

4481.4 

2 077034 

487 

3 

8 

1 

500 

400 

7 700 

2 520.6 

1554436 

483 

3 

20 

13 

500 

1700 

15100 

6 523.2 

5161462 

500 

3 

100 

93 

500 

3400 

20 800 

9 861.0 

8493 979 

500 

N 

D 

No 

\Dx,i\ 

min 

X,1 

max 

X,1 

I^X,l 

^X,l 

\Dx,2\ 

2 

4 

1 

368 

0 

4 900 

892.1 

696 813 

230 

2 

10 

7 

458 

0 

5 900 

1018.3 

784991 

371 

2 

100 

97 

487 

0 

7 700 

1092.6 

932 923 

448 

3 

8 

1 

483 

0 

5 400 

767.9 

665 658 

424 

3 

20 

13 

500 

0 

11900 

1563.6 

2 322 035 

500 

3 

100 

93 

500 

0 

11600 

1950.8 

4 057459 

500 

N 

D 

No 

\Dx,2\ 

min 

X,2 

max 

X,2 

f^X,2 

^X,2 

\Dx,3\ 

2 

4 

1 

230 

0 

5 800 

764.8 

612 455 

132 

2 

10 

7 

371 

0 

5 700 

827.2 

585 970 

272 

2 

100 

97 

448 

0 

5 900 

989.3 

720 108 

360 

3 

8 

1 

424 

0 

5 300 

612.5 

509 301 

368 

3 

20 

13 

500 

0 

10 200 

1159.6 

1 652 248 

500 

3 

100 

93 

500 

0 

13 600 

1478.8 

2 400 471 

500 


Table 7: Evaluation of data from Experiment 3 with cq = —40, Cs = —20 and function fsch 


N 

D 

No 

\Dx,o\ 

min 

x,o 

max 

x,o 

AiX,0 

^x,o 

\Dx,i\ 

3 

8 

3 

500 

1200 

12 900 

4 664.6 

3 911887 

474 

3 

20 

15 

500 

2 300 

16 000 

6 931.2 

5 668 627 

492 

3 

100 

95 

500 

2 900 

17500 

8 996.0 

8 277224 

491 

N 

D 

No 

\Dx,i\ 

min 

X,1 

max 

X,1 

I^X,l 

^X,l 

\Dx,2\ 

3 

8 

3 

474 

0 

12 600 

1171.7 

2 251437 

371 

3 

20 

15 

492 

0 

15 000 

1270.7 

2 516 623 

452 

3 

100 

95 

491 

0 

14100 

1365.4 

2 824 870 

446 


N 

D 

No 

\Dx,2\ 

min 

X,2 

max 

X,2 

I^X,2 

^X,2 

l^xM 

3 

8 

3 

371 

0 

5 800 

990.0 

1037 286 

255 

3 

20 

15 

452 

0 

11300 

990.7 

1588 719 

367 

3 

100 

95 

446 

0 

6 000 

942.2 

1108 582 

380 


65 


































Table 8: Evaluation of data from Experiment l^with Co = -40, Cs 


—20 and function /^ia 


N D No 


\Dx., 


o| 


min 

x,o 


max 

x,o 




a 


x,o 


\Dx,- 


3 3 1 
3 10 8 


500 700 13 300 2 557.0 1845 011 
500 2 300 26 200 5 722.8 9167 760 


4 3 1 

4 10 8 


500 1500 28 500 6 533.4 13 461824 
500 6 200 36 900 14990.2 26 613 084 


46 

75 


N 

D 

No 

\Dx,i\ 

min 

X,1 

max 

X,1 


^X,l 

DxM 

3 

3 

1 

1 

1000 

1000 

1000.0 

0 

0 

3 

10 

8 

1 

500 

500 

500.0 

0 

0 

4 

3 

1 

46 

300 

13 500 

3021.7 

9 384 310 

4 

4 

10 

8 

75 

400 

8 000 

2 809.3 

3 977646 

11 

N 

D 

No 

\Dx,2\ 

min 

X,2 

max 

X,2 


^X,2 

I^X,3| 

3 

3 

1 

0 

oo 

—oo 

0.0 

0 

0 

3 

10 

8 

0 

oo 

—oo 

0.0 

0 

0 

4 

3 

1 

4 

1400 

16 600 

6 825.0 

33 761 875 

0 

4 

10 

8 

11 

900 

13100 

3 800.0 

11474 545 

0 


66 


























